Audit agents with hourly momentary sampling, not end-of-day recall — memory overweights successes and hides failures
Use hourly momentary sampling over 48+ hours rather than end-of-day recall when auditing behavioral agents, because retrospective memory systematically overweights salient successes and underweights invisible failures.
Why This Is a Rule
Experience Sampling Method (ESM), developed by Csikszentmihalyi and Larson, captures behavior in real-time rather than through retrospective reconstruction. The difference matters enormously for behavioral auditing because retrospective recall is systematically distorted: successes are salient (you remember the morning you wrote for 30 minutes), failures are invisible (you don't notice the three mornings you didn't), and the resulting self-assessment is inflated (Define agent success as 80%+ firing rate, not subjective satisfaction — felt reliability systematically inflates actual performance).
Hourly momentary sampling — a prompt every hour asking "what are you doing right now?" over 48+ hours — captures a representative sample of actual behavior. The hourly interval is frequent enough to detect behavioral patterns without being so frequent that it disrupts the behavior being observed. The 48-hour window spans at least two full day cycles, capturing both weekday and varying energy patterns.
The data this produces is qualitatively different from recall-based auditing. Instead of "I think I followed my morning routine most days," you get "7 of 14 hourly samples during target windows showed designed agent firing; 7 showed default behavior." The numbers can't be inflated by memory bias.
When This Fires
- When conducting a baseline behavioral audit before designing new agents
- When displacement rate tracking (Measure behavioral agent progress by displacement rate, not perfection — replacement is gradual, not binary) relies on self-report and you want to verify accuracy
- When you suspect your self-assessment of agent reliability is inflated
- During periodic deep audits (quarterly) to calibrate your ongoing tracking methods
Common Failure Mode
Using end-of-day journaling as the audit method: "Today I did X, Y, Z." By evening, memory has already reconstructed the day through an identity-consistent lens. You remember the productive morning but not the 90 minutes of phone scrolling mid-afternoon. The audit confirms your self-narrative rather than revealing your actual behavior.
The Protocol
(1) Set hourly prompts for 48+ hours (phone timer, wearable, or app). (2) At each prompt, record in 10 seconds or less: what you are doing right now, and whether the current behavior is a designed agent, a default, or neither. (3) Do not try to change behavior during the sampling period — the goal is observation, not intervention. (4) After 48 hours, compile: what percentage of samples showed designed agents firing? What percentage showed defaults? Where are the gaps? (5) Compare the sample data to your self-assessment. The gap between "what I think I do" and "what sampling shows I do" is the bias correction factor for your ongoing self-monitoring.