Accumulate anomalies on a running list — trigger schema review at the count threshold, not at each one
Accumulate anomalies (observations that don't fit the schema) in a running list and trigger full schema review when the count reaches a pre-defined threshold, rather than treating each anomaly as requiring immediate action.
Why This Is a Rule
Individual anomalies are ambiguous — they could be noise, context-specific exceptions, or genuine signals of schema drift. Treating each anomaly as requiring immediate action produces constant schema churn (too reactive). Ignoring anomalies entirely produces stale schemas (too rigid). Accumulating them with a count threshold provides the middle path: individual anomalies are logged without reaction; accumulated anomalies at threshold trigger systematic review.
This is the same logic as anomaly detection in monitoring: individual alert spikes are often noise, but a sustained pattern of spikes triggers investigation. The threshold separates noise from signal by requiring accumulation before action.
The pre-defined threshold (set before anomalies accumulate, not after) prevents motivated reasoning about "how many is enough." A common threshold: 3-5 anomalies for a single schema. If 3 observations don't fit the schema within a reasonable timeframe, the schema's assumptions deserve scrutiny.
When This Fires
- When an observation doesn't fit your current schema but isn't dramatic enough to warrant immediate review
- During any ongoing schema monitoring when individual anomalies feel ambiguous
- When you want to balance responsiveness with stability in schema management
- Complements Pre-commit to prediction failure thresholds (X out of Y) that trigger automatic schema review (prediction failure thresholds) with an anomaly-based trigger
Common Failure Mode
Explaining away each anomaly individually: "This one was unusual circumstances. That one was an edge case. The other one was bad data." Each explanation is plausible in isolation. But 5 individually-explained anomalies might share a common cause that only becomes visible when you look at the set rather than each one independently.
The Protocol
(1) Create a running anomaly list for each important schema. (2) When an observation doesn't fit → log it: date, what you expected, what happened, which schema was involved. Do NOT investigate or revise the schema yet. (3) Set a threshold (default: 5 anomalies). (4) When the threshold is reached → trigger full schema review. Look at the accumulated anomalies as a set: do they share a pattern? Point to a common cause? Reveal a systematic boundary condition? The set analysis produces insights that individual anomaly analysis cannot.