Document five recovery components for every important process: failure mode, detection trigger, recovery steps, time target, verification
For every important process, document five components of recovery: failure mode specification, detection trigger, ordered recovery steps, recovery time target, and verification check.
Why This Is a Rule
Recovery procedures designed during a crisis are reactive, incomplete, and biased by the specific failure that triggered them. Recovery procedures designed in advance — before any failure — can be comprehensive, tested, and optimized for the most likely failure modes. The five-component structure ensures completeness.
Failure mode specification: what specific type of failure does this recovery address? Not "things go wrong" but "the daily writing habit has stopped firing for more than one week." Detection trigger: how will you know this failure has occurred? What observable signal initiates recovery? Ordered recovery steps: what specific actions, in what sequence, restore the process? Written in procedural form (Write agent actions as procedures a stranger could follow — aspirations and principles are not executable steps) so they're executable under the cognitive impairment that typically accompanies process failure. Recovery time target: how long should recovery take? This prevents indefinite "getting back on track" that never converges. Verification check: how do you confirm recovery is complete? What does restored operation look like?
Without all five, recovery is incomplete: missing the trigger means you don't detect failure; missing ordered steps means you fumble through restoration; missing the time target means recovery takes indefinitely; missing verification means you declare recovery before it's actually complete.
When This Fires
- When designing any important recurring process — build the recovery plan at design time
- After a process failure that took too long to detect or recover from
- During process architecture reviews when assessing resilience
- Complements Design three operating modes for every important process: full, reduced, and minimal — degrade fidelity but never lose continuity (three operating modes) with the recovery path when even minimal mode fails
Common Failure Mode
Vague recovery plans: "If my exercise habit breaks, I'll get back into it." This lacks all five components: no specific failure mode, no detection trigger, no ordered steps, no time target, no verification. When the habit actually breaks, "getting back into it" produces weeks of vague intention without structured restoration.
The Protocol
(1) For each important process, write a recovery plan with five components: Failure mode: "This process has failed when [specific, observable condition]." Detection: "I will detect this failure when [trigger — e.g., 'my weekly review shows zero sessions for two consecutive weeks']." Recovery steps: ordered procedure to restart. (1) Identify the minimal viable version (Start every new agent at under two minutes with zero preparation — automaticity requires low activation energy first). (2) Set the trigger (Agent triggers must be observable or measurable — vague triggers like "when I feel ready" never fire reliably). (3) Execute the first session within 24 hours. (4) Follow The first five consecutive executions of a new trigger are non-negotiable — this is the window where automaticity lives or dies (protect first five executions). Time target: "Full recovery to target operating mode within [timeframe]." Verification: "Recovery is confirmed when [metric] reaches [level] for [duration]." (2) Write this plan when the process is healthy — not during the crisis of its failure.