Post-action reviews

The action ended. The learning has not started.

Pre-flight checks catch errors before they propagate. That is what you learned in the previous lesson. But pre-flight checks operate on prediction — you are scanning for problems you can anticipate based on what you already know. The harder question is what happens after the action is complete, when the gap between what you expected and what actually occurred contains information you could not have predicted in advance.

Most people treat completed tasks as finished objects. You did the thing. You move on. Whatever happened, happened. But every completed action is an unprocessed data source — a record of your assumptions colliding with reality that contains precise, recoverable information about where your mental models are wrong. The post-action review is the mechanism that extracts that information before it decays.

Without this mechanism, you accumulate experience without extracting learning. You repeat the same errors with increasing confidence. You develop ten years of experience that is really one year repeated ten times. The post-action review is the structural difference between these two trajectories.

The US Army built this first — and proved it works

The post-action review is not a productivity hack. It is one of the most rigorously tested learning interventions in organizational history, and it was invented under conditions where errors kill people.

The US Army developed the After Action Review (AAR) in the 1970s, following the institutional reckoning that came after Vietnam. Senior Army leadership recognized that combat units needed a systematic mechanism to learn from training exercises — not just to repeat them. The first major implementation came in 1981 at the National Training Center at Fort Irwin, California, where units ran 14-day simulated desert combat scenarios against a skilled opposing force. After each engagement, units conducted structured AARs using four questions that have since become the canonical format:

What was supposed to happen?
What actually happened?
Why was there a difference?
What will we do differently next time?

The simplicity is deceptive. These four questions enforce a cognitive discipline that most reflection lacks: they require you to articulate your expectations before evaluating results, to describe observable outcomes rather than impressions, to identify structural causes rather than blame individuals, and to commit to a specific process change rather than a vague intention to improve.

The results are not ambiguous. Tannenbaum and Cerasoli (2013) conducted a meta-analysis of 46 samples involving over 2,100 individuals and found that structured debriefs — the category that includes AARs — improve effectiveness over control groups by approximately 25%, with a Cohen's d of 0.67. That is a moderate-to-large effect size from a relatively inexpensive intervention. The effect held across teams and individuals, across simulated and real settings, and across medical and nonmedical domains. A subsequent 2020 meta-analysis of 61 studies found an even larger effect on training retention (d = 0.79), with AARs reducing repeated mistakes by up to 20%.

Twenty-five percent better performance from a 30-minute conversation. The mechanism is not motivation or accountability. It is structured comparison between expectation and reality, conducted close enough to the event that the relevant details are still accessible in memory.

Why unstructured reflection fails

You might object that you already reflect on your actions. You think about what went well and what did not. You have feelings about your performance. You sometimes talk things through with a colleague.

This is not a post-action review. This is rumination wearing the costume of reflection.

The cognitive science is clear on why unstructured reflection produces unreliable learning. Human memory is not a recording device. It is a reconstruction engine, and it is biased in systematic ways that corrupt the raw data of experience. Rosy retrospection — the tendency to recall past events more positively than they actually were — means your unstructured memory of a completed task is already distorted toward self-serving conclusions within hours of completion. Hindsight bias — the feeling that you "knew it all along" — collapses the gap between what you expected and what happened, making it impossible to identify where your predictions were wrong. Narrative bias reorganizes scattered events into a coherent story, smoothing over the specific moments where errors occurred.

Gary Klein, the psychologist who pioneered naturalistic decision making, developed the Critical Decision Method specifically to counteract these biases. Klein's research, beginning in the 1980s with studies of firefighters making life-or-death decisions, showed that experts who were simply asked to describe what happened produced distorted narratives. But when the retrospective analysis was structured — when they were prompted to identify specific decision points, articulate what they expected at each point, and compare those expectations against what actually occurred — the quality of learning improved dramatically (Klein, 1998). The structure of the review is not a formality. It is a cognitive scaffold that prevents the reconstruction biases of memory from overwriting the actual data.

This is why Tannenbaum and Cerasoli's meta-analysis found that facilitated, structured debriefs significantly outperformed unstructured ones. An objective facilitator and a defined question protocol prevent the review from drifting into blame, self-congratulation, or narrative smoothing. The four-question AAR format works precisely because it forces the reviewer to hold expectation and outcome as separate objects — and to locate the cause of the gap in process, not in character.

Aviation knew this would save lives

The US Army was not the only high-stakes domain to discover the power of structured post-action review. Aviation arrived at the same conclusion through a different path — and with a body count that demanded it.

In 1979, NASA psychologist John Lauber coined the term "Crew Resource Management" (CRM) after research revealed that a series of jet transport accidents were caused not by mechanical failure or lack of skill, but by failures of communication, coordination, and decision-making within cockpit crews. The post-flight debrief became a core component of CRM training: after every flight, crews systematically review what they planned, what happened, where deviations occurred, and what adjustments are needed.

NASA developed structured debrief models — including the C-A-L (CRM, Analysis, Leadership) framework — specifically to prevent debriefs from becoming superficial. The structure forces crews to identify CRM-related factors that influenced outcomes, to analyze and evaluate specific performance elements rather than offering general impressions, and to surface issues that individuals might not raise spontaneously due to hierarchy or discomfort.

The pattern across domains is consistent. The US Army, commercial aviation, emergency medicine, and surgical teams all independently converged on the same structural insight: unstructured reflection after action is unreliable. Structured comparison between expectation and outcome, conducted promptly and focused on process rather than blame, is the minimum viable mechanism for extracting learning from experience.

The four questions are a cognitive protocol

Look at the AAR's four questions again, this time as a cognitive protocol rather than a checklist:

What was supposed to happen? This question forces you to externalize your mental model. Before you can evaluate an outcome, you must articulate what you predicted. Most people skip this step entirely — they evaluate results against a vague, retroactively adjusted sense of what they expected. By writing down your pre-action expectations, you create a fixed reference point that hindsight bias cannot rewrite.

What actually happened? This question demands observable evidence. Not how you feel about what happened. Not what you think happened. What happened — described in terms a camera would capture. The discipline of separating observation from interpretation is the single most important skill in error correction, and most people have never practiced it.

Why was there a difference? This is where the real learning occurs. The gap between expectation and outcome is a signal — it tells you exactly where your mental model was wrong, where your process had a hole, or where your assumptions did not match reality. The critical discipline here is to locate the cause at the level of structure and process, not at the level of personal effort or character. "I did not try hard enough" is not an error analysis. "I did not allocate review time before the deadline, so errors in the draft were not caught" is an error analysis.

What will I do differently next time? This question converts analysis into commitment. The output must be a concrete process change — a step added, a check introduced, a sequence reordered, a resource allocated. If the answer is "be more careful" or "try harder," the review has failed. You are proposing to change your effort level, which is the least reliable variable in any system. Change the process instead.

The AI parallel: evaluation as post-training review

The post-action review has a precise structural analog in machine learning, and it illuminates why the mechanism matters.

When a language model is trained, the training run is the action. But the training run alone does not tell you whether the model is good. That determination requires a separate evaluation phase — a structured comparison between what the model was supposed to do (the benchmark criteria, the alignment targets, the expected behavior) and what the model actually does (its outputs on held-out test sets, its behavior in adversarial scenarios, its responses to edge cases).

In Reinforcement Learning from Human Feedback (RLHF), this post-training review is built into the pipeline. Human evaluators rate or rank the model's outputs. Those ratings train a reward model — a separate system whose sole purpose is to encode what "good" looks like. The reward model then provides the feedback signal that adjusts the language model's behavior. The evaluators are running a post-action review: What should the model have said? What did it actually say? Where is the gap? How should the model adjust?

The lesson is the same one the US Army learned in 1981. The quality of the evaluation protocol determines the quality of the learning. A vague evaluation — "the model seems pretty good" — produces vague improvement. A structured evaluation — with specific criteria, observable outputs, identified gaps, and concrete adjustments — produces targeted improvement. In machine learning, as in human learning, the post-action review is not an optional appendix to the action. It is the mechanism through which the action becomes useful information.

Without evaluation, a trained model is just a collection of parameters. Without a post-action review, a completed task is just a collection of events. The evaluation is what converts raw experience into future capability.

Your post-action review protocol

Here is a minimal protocol you can run after any completed task. It takes 10 to 20 minutes and requires only a notebook or a text file.

Timing: Run the review within 48 hours of task completion. Memory degrades rapidly — Klein's research showed that critical decision details begin distorting within days. The closer to the event, the more accurate the data.

The four questions: Write your answers. Do not just think them. Writing forces precision that thinking permits you to avoid.

What did I intend to happen? Write the specific outcome you expected before starting. If you did not have explicit expectations, that is your first finding — you were operating without a reference standard.
What actually happened? Describe the observable result. Use numbers where possible. Separate facts from interpretations.
Why was there a gap? Identify at least one structural cause. Use the "five whys" if needed — keep asking why until you reach a process-level explanation, not a character-level one.
What will I do differently? Write one specific process change. Not a resolution. Not an intention. A change to the sequence, the inputs, the timing, or the structure of how you do the task.

Duration: Fifteen minutes is enough for a solo review. Forty-five minutes for a team. If you find yourself going longer, you are probably drifting into narrative rather than holding the structure.

Storage: Keep your reviews in a single, searchable location. The compound value of post-action reviews comes from reviewing the reviews — seeing patterns across multiple tasks that reveal systemic tendencies rather than isolated errors.

The compound return on structured review

A single post-action review corrects a single error. That is useful. But the real power of the practice is not in any individual review — it is in the pattern recognition that emerges when you accumulate reviews over time.

After ten reviews, you will notice that the same structural cause appears in multiple gaps. Maybe you consistently underestimate how long creative work takes. Maybe you routinely skip a verification step when you are under time pressure. Maybe your expectations are systematically miscalibrated in a specific domain. These are not random errors. They are systemic tendencies — reliable patterns in how your cognitive infrastructure produces predictable failure modes.

Identifying a systemic tendency is qualitatively different from identifying an isolated error. An isolated error is a one-time fix. A systemic tendency, once identified, can be corrected at the root — through a process redesign, a new default, a checklist item, or a structural constraint that prevents the error class from occurring at all.

This is the bridge to what comes next. The previous lesson taught you to catch errors before they propagate through pre-flight checks. This lesson teaches you to surface errors after they occur through structured review. But what happens when errors — whether caught before or after — are identified but not corrected? They accumulate. They interact. They compound. A small uncorrected error in one task becomes an assumption in the next task, which produces a larger error, which feeds into the task after that. This is the phenomenon of error cascades, and it is the subject of the next lesson (L-0491).

The post-action review is your mechanism for breaking the cascade before it begins. Every gap you identify, every structural cause you trace, every process change you implement is an error removed from the chain before it can multiply. The question is not whether you make errors. You do. Everyone does. The question is whether you have a mechanism that converts those errors into corrections — or whether you let them accumulate silently until the cascade becomes catastrophic.

Sources:

Tannenbaum, S. I., & Cerasoli, C. P. (2013). "Do Team and Individual Debriefs Enhance Performance? A Meta-Analysis." Human Factors, 55(1), 231-245.
Klein, G. (1998). Sources of Power: How People Make Decisions. MIT Press.
US Army (1993). A Leader's Guide to After-Action Reviews. Training Circular 25-20. Department of the Army.
Lauber, J. K. (1984). "Resource Management in the Cockpit." Air Line Pilot, 53, 20-23.
Meadows, D. H. (2008). Thinking in Systems: A Primer. Chelsea Green Publishing.
Christiano, P. F., et al. (2017). "Deep Reinforcement Learning from Human Preferences." Advances in Neural Information Processing Systems, 30.
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.