Speed is a metric, not a side effect
In L-0546, you learned to measure agent effectiveness — whether each agent produces the desired outcome when it fires. That metric answers the question of accuracy: did the right thing happen? But accuracy alone is incomplete. A fire alarm that rings twenty minutes after the building is engulfed in flames is accurate. It is also useless.
This lesson introduces the second critical dimension of agent monitoring: latency. How quickly does your agent activate after its trigger appears? The gap between trigger and activation — the time-to-fire — determines whether an accurate agent is also a useful one. In many domains, a slow correct response is indistinguishable from no response at all.
This is not a minor refinement. It is a shift in what you measure and, therefore, what you optimize.
The science of response latency
The study of how quickly humans respond to stimuli is one of the oldest research programs in experimental psychology. In 1868, the Dutch physiologist Franciscus Donders published a landmark study that established mental chronometry — the measurement of the time course of mental operations. Donders designed three tasks of increasing complexity. His A-method measured simple reaction time: press a button when a light appears. His B-method measured go/no-go reaction time: press the button only when the correct stimulus appears among alternatives. His C-method measured choice reaction time: press different buttons depending on which stimulus appears. By subtracting the times, Donders isolated the duration of specific mental operations — discrimination took roughly 36 milliseconds, response selection roughly 47 milliseconds. He proved that thinking takes measurable time, and that different kinds of thinking take different amounts of it.
Donders's insight underpins everything that follows in this lesson: your cognitive agents are not instantaneous. They are processes that consume time, and that time varies depending on the complexity of the trigger, the familiarity of the pattern, and the degree of automaticity your agent has achieved.
In 1952, William Edmund Hick and Ray Hyman independently formalized what became Hick's Law, one of the few widely acknowledged quantitative laws in psychology. Hick's Law states that reaction time increases logarithmically with the number of stimulus-response alternatives. The equation is straightforward: RT = K log2(N + 1), where K is a constant and N is the number of possible choices. Double the number of alternatives and reaction time increases by a fixed increment, not by a doubling. The implication for agent monitoring is direct: an agent that must discriminate among many possible triggers will be inherently slower than one that responds to a single, unambiguous cue. If you want faster agents, reduce the number of alternatives they must evaluate — make their triggers as specific and unambiguous as possible.
Daniel Kahneman extended this line of research in Attention and Effort (1973), demonstrating that attentional load directly affects response speed. When your cognitive system is already taxed — multiple demands competing for limited working memory — every agent slows down. Your four working memory slots (as John Sweller would later quantify in his cognitive load theory) are shared across all active processes. An agent that fires in 200 milliseconds when you are rested and focused might take two seconds when you are distracted, stressed, or multitasking. The latency is not a fixed property of the agent. It is a variable that fluctuates with the state of the system.
Implementation intentions: engineering faster activation
Peter Gollwitzer's research on implementation intentions, spanning three decades, provides the most direct evidence that agent latency is trainable. An implementation intention is an if-then plan: "If situation X arises, then I will perform behavior Y." This structure is functionally identical to what we call a cognitive agent in this curriculum — a trigger-response pair installed in your cognitive architecture.
Gollwitzer's key finding is that forming an implementation intention creates what he calls "strategic automaticity." The if-part of the plan becomes a highly accessible mental representation — your perceptual system is primed to detect the trigger with heightened sensitivity. Simultaneously, a strong associative link forms between the trigger representation and the response, so that when the trigger appears, the response initiates immediately, efficiently, and without requiring further conscious deliberation. Studies using event-related potentials (ERPs) have confirmed that implementation intentions modulate early processing indicators — the P100, P300, and N170 components — showing that the effect operates at the level of perceptual encoding, not just conscious decision-making.
The practical translation: people with well-formed implementation intentions respond to their triggers faster, with less cognitive effort, and with less dependence on conscious willpower than people who hold the same goal but lack the specific if-then structure. In Gollwitzer and Brandstatter's experiments, if-then planners acted quickly even under cognitive load, dealt effectively with competing demands, and did not need to consciously intend to act at the critical moment.
This is the mechanism by which your cognitive agents accelerate. When you first install an agent — a new habit, a decision rule, a boundary — the trigger-response pathway is effortful and slow. It routes through conscious deliberation. You notice the trigger, think about what to do, decide, and then act. Every stage consumes time. As the agent strengthens through repetition, the pathway progressively bypasses conscious processing. The trigger activates the response directly. Time-to-fire collapses.
Habit latency: the automaticity curve
The habit research literature quantifies this progression. Benjamin Gardner and colleagues developed the Self-Report Behavioural Automaticity Index (SRBAI), which measures habit strength along the dimension that matters most for time-to-fire: the degree to which a behavior is triggered automatically upon cue exposure. A fully automatic habit — one that has completed the automaticity curve — fires without conscious mediation. The cue triggers the response directly, in the way a reflex triggers a flinch.
The research on habit formation timelines, including a 2024 systematic review and meta-analysis, establishes that achieving full automaticity in a new health behavior takes a minimum of two to five months of consistent repetition in the presence of the cue. This contradicts the popular 21-day myth. More importantly, the progression is not linear. Automaticity follows an asymptotic curve — rapid initial gains followed by a long plateau of diminishing returns as the behavior approaches full automation. The implication for monitoring is that early in a habit's life, time-to-fire will decrease rapidly with each repetition. Later, improvements will be marginal. Knowing where you are on this curve for each agent tells you where training effort will produce the greatest latency reduction.
Recent experimental paradigms have begun to measure cue-response associations objectively, not just through self-report. Researchers use tasks that test the strength of the associative link between context and behavior, yielding measures that correlate with — but are more precise than — subjective automaticity reports. The direction is clear: habit science is converging on the same precision in latency measurement that engineering has demanded for decades.
Latency percentiles: borrowing from systems engineering
The Site Reliability Engineering (SRE) discipline offers a measurement framework that translates directly to agent monitoring. When engineers measure the performance of a web service, they do not use averages. Averages are liars. A service with an average response time of 100 milliseconds might serve 95% of users in 50 milliseconds and 5% in 1,100 milliseconds. The average looks healthy. The experience for that 5% is catastrophic.
Instead, SRE practice uses percentile latencies. P50 (the median) describes the typical experience — half of all responses are faster, half are slower. P95 describes the experience of the unlucky minority — only 5% of responses are slower than this threshold. P99 exposes the worst-case tail — the 1% of responses where something went wrong. The standard practice is to set Service Level Objectives (SLOs) against these percentiles: "95% of requests will complete in under 200ms" is a P95 SLO.
The reasoning is instructive. P50 tells you how well the system works when everything is normal. P95 tells you how well it degrades under stress. P99 tells you about architectural weaknesses — rare conditions that expose fundamental limitations. These are different questions, and you need all three to understand your system.
Now apply this framework to a cognitive agent. Your meditation agent — the one that reminds you to take three deep breaths when you notice rising frustration — has a latency profile just like a web service. On a calm Tuesday morning with low cognitive load and a clear trigger, it might fire in under a second (your P50). On a Thursday afternoon after six hours of meetings and a contentious email, it might take thirty seconds to activate (your P95). On a Friday evening when you are exhausted, hungry, and your child is screaming while you are on hold with customer service, it might take ten minutes — or not fire at all (your P99).
If you only measure your P50 — how the agent performs under ideal conditions — you will conclude it works beautifully. But agents do not need to work under ideal conditions. They need to work under the conditions where you need them most, which are almost always the P95 and P99 scenarios: high load, high stress, depleted resources.
The AI inference parallel
The artificial intelligence industry has converged on the same insight from a purely engineering direction. When evaluating a large language model's performance, the key latency metric is Time to First Token (TTFT) — how quickly the model begins generating its response after receiving a prompt. A chatbot with a TTFT under 500 milliseconds feels responsive and natural. A code completion tool needs TTFT below 100 milliseconds for seamless developer experience. Above these thresholds, the interaction feels sluggish regardless of how good the eventual output is.
The optimization techniques are revealing. Engineers use quantization (reducing the precision of the model to speed up computation), speculative decoding (predicting likely outputs in advance), and dynamic token pruning (dropping unimportant context to reduce processing load). Each technique trades some accuracy for reduced latency — the same tradeoff your cognitive system makes when it automates a response. A fully deliberated response is maximally accurate but slow. An automatic response is faster but less nuanced. The engineering question is never "which is better?" but "what latency-accuracy tradeoff produces the best outcome for this specific use case?"
Your cognitive agents face the same tradeoff. An agent that routes through full conscious deliberation is accurate but slow. An agent that has been automated through repetition is fast but less flexible. The monitoring question is: for each agent, what is the latency target that produces the best real-world results? A boundary-setting agent in a negotiation might need to fire in seconds — too slow and you have already conceded the point. A career-decision agent might appropriately take days — rushing would sacrifice deliberation quality for unnecessary speed.
Measuring time-to-fire in practice
Translating this framework into a personal monitoring practice requires three components.
First, define the measurement window for each agent. Not all agents need the same speed. Classify each agent by its temporal urgency. Some operate on a reflex timescale (milliseconds to seconds): flinching away from danger, catching an impulsive word before it leaves your mouth, recognizing a phishing email before clicking. Some operate on a deliberation timescale (seconds to minutes): activating a decision framework during a meeting, engaging a conflict-resolution protocol when you notice rising tension. Some operate on a reflection timescale (minutes to hours): recognizing that your current project has drifted from your values, noticing that a relationship pattern has become unhealthy. The appropriate latency metric depends on the timescale. Measuring a reflection-timescale agent in milliseconds is meaningless. Measuring a reflex-timescale agent in hours is a failure to detect failures.
Second, capture trigger-to-activation observations. Each time you notice an agent firing, record two timestamps: when the trigger appeared and when you became aware of the agent engaging. In the early stages of monitoring, these will be rough estimates. That is fine. Precision improves with practice, and even rough percentile distributions are more informative than no measurement at all. The categories INSTANT, SECONDS, MINUTES, and MISSED provide a workable initial resolution. Over time, you can refine within categories — distinguishing two-second activations from thirty-second activations as your awareness sharpens.
Third, build your personal percentile profile. After accumulating observations over days or weeks, arrange them from fastest to slowest. Your P50 is the middle observation — how the agent performs in typical conditions. Your P95 is near the slow end — how it performs when conditions are difficult. Your P99 is your worst observed activation — the scenario that exposed the agent's limits. Now you have three numbers that tell you something specific and actionable about each agent.
If your P50 is good but your P95 is poor, your agent works under normal conditions but degrades under stress. The intervention is stress inoculation — deliberately practicing the trigger-response pattern under increasing cognitive load.
If your P50 and P95 are both good but your P99 is catastrophic, your agent has a rare failure mode — a specific condition that breaks it entirely. The intervention is targeted: identify the condition (fatigue? social pressure? a specific emotional state?) and build a contingency plan.
If your P50 is already poor, the agent needs more basic training — more repetitions, a clearer trigger specification, or a simpler response to reduce the processing time.
Why latency measurement changes behavior
There is a deeper reason to measure time-to-fire, beyond diagnostic accuracy. The act of measuring latency changes your relationship with your agents. When you only track whether an agent fires, you think of agent performance as binary — working or broken. When you track how quickly it fires, you begin to perceive the continuous gradient of automaticity. You notice the progression from effortful and slow to fluid and fast. You notice the conditions that cause regression. You develop an intuitive feel for where each agent sits on the automaticity curve.
This perceptual shift matters because it changes what you optimize. The person who thinks in binary terms stops improving once the agent fires reliably — it works, so why keep training? The person who thinks in latency terms recognizes that a working agent can still be made faster, and that faster activation often means the difference between an agent that protects you and one that merely confirms, after the fact, that protection was needed.
Anders Ericsson's research on expert performance illustrates the pattern. Experts do not stop at automaticity. Non-experts reach a level of automatic performance and plateau there — they can do the task without thinking, so they stop deliberately improving. Experts resist this plateau. They develop increasingly complex mental representations that maintain high-speed performance while preserving flexibility. They are not merely automatic. They are precisely calibrated — fast enough to operate in real-time but sophisticated enough to handle novelty.
Your agents need the same trajectory. Automaticity is not the finish line. It is the prerequisite for the real work: calibrating each agent to fire at the right speed for its domain, maintaining that speed under degraded conditions, and continuing to refine the trigger-response pathway even after it feels effortless.
From latency to signal quality
You now have two metrics for each agent: effectiveness (from L-0546) and latency (from this lesson). Together, they tell you whether the agent fires correctly and whether it fires quickly enough to matter. But there is a third dimension that completes the picture: signal quality.
An agent can fire fast and produce the right outcome — but fire on the wrong trigger. A threat-detection agent that activates instantly at genuine danger is invaluable. The same agent activating instantly at harmless stimuli is exhausting and eventually paralyzing. The next lesson, L-0548, examines the false positive rate: how often your agents activate unnecessarily. High false-positive rates cause alert fatigue — the cognitive equivalent of a car alarm that goes off whenever a truck drives past. You stop responding. You learn to ignore the very system designed to protect you. Measuring latency without measuring signal accuracy gives you a fast agent that might be destroying its own credibility with every unnecessary firing.
Sources:
- Donders, F. C. (1868/1969). "On the speed of mental processes." Acta Psychologica, 30, 412-431. (Translated by W. G. Koster from the original Dutch publication in Onderzoekingen gedaan in het Physiologisch Laboratorium der Utrechtsche Hoogeschool.)
- Hick, W. E. (1952). "On the rate of gain of information." Quarterly Journal of Experimental Psychology, 4(1), 11-26.
- Kahneman, D. (1973). Attention and Effort. Prentice-Hall.
- Gollwitzer, P. M. (1999). "Implementation intentions: Strong effects of simple plans." American Psychologist, 54(7), 493-503.
- Gollwitzer, P. M., & Sheeran, P. (2006). "Implementation intentions and goal achievement: A meta-analysis of effects and processes." Advances in Experimental Social Psychology, 38, 69-119.
- Gardner, B., Abraham, C., Lally, P., & de Bruijn, G.-J. (2012). "Towards parsimony in habit measurement: Testing the convergent and predictive validity of an automaticity subscale of the Self-Report Habit Index." International Journal of Behavioral Nutrition and Physical Activity, 9, 102.
- Ericsson, K. A., Krampe, R. T., & Tesch-Romer, C. (1993). "The role of deliberate practice in the acquisition of expert performance." Psychological Review, 100(3), 363-406.
- Proctor, R. W., & Schneider, D. W. (2018). "Hick's law for choice reaction time: A review." Quarterly Journal of Experimental Psychology, 71(6), 1281-1299.