Audit decision-driving metrics every 3 months: what does it incentivize, is the proxy still valid, and what does gaming look like?
When a metric has been used to drive decisions for more than three months without revision, conduct a three-question audit: what behavior does this metric actually incentivize, is the proxy still correlated with the outcome, and what would gaming this metric look like compared to current behavior.
Why This Is a Rule
Goodhart's Law — "When a measure becomes a target, it ceases to be a good measure" — operates on every metric that drives decisions, including personal ones. When you track "words written per day" and optimize for it, you start writing more words without considering quality. When you track "meetings attended" and optimize for it, you attend more meetings without considering relevance. The metric and the outcome it was meant to proxy for gradually decouple as the metric itself becomes the optimization target.
Three months is the typical decay period for metric-outcome correlation in personal systems. In the first month, the metric is fresh and closely tracks the outcome you care about. By month three, behavioral optimization has begun to exploit the gap between metric and outcome. By month six, a metric that was never audited may have completely decoupled — you're optimizing a number that no longer reflects the thing you actually wanted.
The three audit questions catch the three failure modes: incentive misalignment (the metric rewards the wrong behavior), proxy decoupling (the metric no longer correlates with the outcome), and gaming (you're unconsciously optimizing the metric at the expense of the outcome).
When This Fires
- Every 3 months for any metric that influences your decisions or behavior
- When a metric looks good but the underlying reality feels wrong — Goodhart's Law may be operating
- When someone proposes a new metric to drive decisions — plan the first audit at the 3-month mark
- During system reviews when evaluating measurement quality
Common Failure Mode
Trusting long-running metrics because they've "always worked": "We've tracked lines of code for years." The longer a metric has been used without audit, the more time gaming behavior has had to exploit the metric-outcome gap. Long tenure is a risk factor, not a safety factor. Audit frequency should increase, not decrease, with metric age.
The Protocol
(1) For each metric that drives decisions, schedule a 3-month audit. (2) Ask three questions: Incentive check: what behavior does this metric actually incentivize? Is that behavior the one you want, or has it drifted? "Words per day" incentivizes word production — but you want quality writing, which words-per-day doesn't measure. Proxy check: is the metric still correlated with the outcome you care about? Track both the metric and the outcome for a sample period. If they've diverged → the proxy has decoupled. Gaming check: describe what gaming this metric would look like. Compare that description to your current behavior. If they overlap → you're gaming without realizing it. (3) If any check fails → revise or replace the metric. (4) If all pass → retain for another 3 months, then re-audit.