Single Point of Failure (SPOF) Audit
Identifies hidden execution risk by revealing where your organization depends on a single person, decision, or system and shows how to reduce that fragility before it causes failure.
Client Self-Assessment
Purpose
This audit identifies where delivery, reliability, or growth depends too heavily on a single person, decision path, or system. Single points of failure increase execution risk, slow response under pressure, and make scale fragile.
What this is
A fast, practical assessment across Policy, People, and Technology that highlights risk concentration and prioritizes high-leverage fixes.
What this is not
Not a compliance exercise.
Not a tool inventory.
Not a re-organization.
How to Use This Audit
- Complete the checklist (15–30 minutes)
- Score each section
- Review red flags
- Prioritize fixes starting with the highest score
1. Policy (Governance and Decision Flow)
Check all that apply
☐ Key decisions depend on one person being available
☐ Decision authority is implicit or undocumented
☐ Objections can block progress without alternatives or deadlines
☐ Client commitments exceed internal capacity controls
Healthy signals
- Clear decision ownership
- Written escalation paths
- Time-bound objections
- Defaults that allow work to proceed
Red flag
If one absence causes decision paralysis, you have a policy single point of failure.
2. People (Knowledge, Ownership, Capacity)
Check all that apply
☐ Only one person can deploy or fix production
☐ One person owns a client relationship end-to-end
☐ Critical knowledge lives in people, not documentation
☐ Senior staff routinely step in to unblock work
Healthy signals
- At least two owners per critical responsibility
- Clear primary and secondary ownership
- Runbooks for recurring work
- Predictable handoffs
Red flag
“Ask them, they’re the only one who knows” indicates a people single point of failure.
3. Technology (Systems and Infrastructure)
Check all that apply
☐ One admin account or credential controls production
☐ No tested rollback or recovery path
☐ Infrastructure changes are manual
☐ Monitoring alerts go to one person
Healthy signals
- Infrastructure defined as code
- Centralized access and secrets
- Tested recovery paths
- Group-based alerting
Red flag
“Don’t touch that system” indicates a technology single point of failure.
SPOF Risk Scoring
Score each area from 0 to 2
- 0 = No meaningful risk
- 1 = Partial or emerging risk
- 2 = Clear single point of failure
Record your scores
- Policy:
- People:
- Technology:
Interpretation
- 0–2 → Healthy
- 3–4 → Latent risk (address proactively)
- 5–6 → Active execution risk (address immediately)
What to Fix First (80/20 Guidance)
Start with changes that:
- Reduce dependency on individuals
- Clarify decision authority
- Make recovery boring and repeatable
Common high-leverage fixes:
- Add a secondary owner
- Write a one-page runbook
- Introduce a default decision rule
- Route alerts to a group
- Automate one manual step
Write an Executive Summary (Optional)
Our highest execution risk comes from [Policy / People / Technology], specifically [X]. This creates fragility during normal operations and significant risk under stress. Addressing [Y] will materially reduce dependency on individuals and restore predictable execution.
Why this matters
Single points of failure rarely show up during calm periods. They surface under pressure; during incidents, growth, or key absences. This audit helps remove hidden fragility before it turns into outages, missed deadlines, or burnout.
Next step
Use this audit as a baseline and re-run it quarterly or after major organizational or technical changes.