Three-tier error budgets: green (no action), yellow (investigate), red (halt and redesign) — with pre-committed responses per zone
Define error budget thresholds in three tiers—green (within budget, no action), yellow (approaching limits, investigate), red (exceeded, halt and redesign)—with pre-committed responses for each zone.
Why This Is a Rule
Binary error budgets (within budget / over budget) miss an important transition zone. A system consuming its error budget rapidly is sending a signal before the budget is fully exhausted — but a binary system can't process this signal because it only distinguishes "fine" from "failed." The three-tier structure adds a yellow zone that provides early warning and graduated response.
Green zone (within budget, well below threshold): the system is operating normally. No action required. Deviations within this zone are expected variance. Response: continue operating; no investigation needed. Yellow zone (approaching the budget limit): the system is consuming its tolerance faster than expected. This warrants investigation — not panic, but attention. Response: diagnose the trend. Why is the budget being consumed more rapidly? Is a structural issue developing? Red zone (budget exhausted): the system has exceeded its tolerance. Normal operation is suspended until the cause is identified and addressed. Response: halt the current approach and redesign (When error budget is exhausted, analyze the pattern not individual incidents — budget exhaustion signals structural problems).
The pre-committed responses eliminate real-time deliberation about what to do in each zone. When you enter yellow, the investigation is automatic — you don't debate whether it's worth looking into. When you enter red, the halt is automatic — you don't rationalize continuing.
When This Fires
- When implementing error budgets for any system (Define your error budget in writing: ideal behavior, minimum acceptable, deviation threshold, and investigation trigger window, Every active goal needs an error budget — define acceptable misses per period to convert brittle perfection into resilient tolerance)
- When a binary budget (fine/failed) doesn't provide enough early warning
- When you want graduated response rather than binary reaction to budget consumption
- When designing monitoring dashboards for personal or team systems
Common Failure Mode
Ignoring the yellow zone because "we haven't exceeded the budget yet": the budget is at 80% consumption with 60% of the period remaining, but since it's not red yet, no action is taken. This defeats the purpose of the yellow zone — it exists precisely to trigger investigation before the budget is fully exhausted. Yellow means "pay attention now so red doesn't happen."
The Protocol
(1) For each error budget (Express error budgets as numbers with time windows — '2 missed sessions per week' not 'try to be consistent'), define three thresholds: Green: 0% to X% of budget consumed. (X is typically 50-60% of budget at the period midpoint.) Yellow: X% to 100% of budget consumed, or consumption rate exceeding the sustainable pace. Red: 100%+ of budget consumed within the period. (2) Pre-commit responses: Green → no action, continue. Yellow → investigate: what's causing accelerated consumption? Can the trend be reversed? Red → halt and redesign: stop normal operation, conduct root cause analysis (When error budget is exhausted, analyze the pattern not individual incidents — budget exhaustion signals structural problems), implement structural fix before resuming. (3) Check budget status at regular intervals — weekly for weekly budgets, daily for daily budgets. The tier should be visible without effort. (4) Never downgrade a red situation to yellow without structural intervention. "Things seem better this week" is not a fix.