Core Primitive
Test systemic changes on a small scale before rolling them out broadly. A pilot program is a bounded experiment — a deliberate test of the proposed system change in a contained context where the change can be observed, measured, and refined without risking the entire organization. Pilots serve three functions: they generate evidence (does the change produce the intended outcome?), they reveal unintended consequences (what side effects emerge in practice?), and they build organizational confidence (the change has been tested and it works). System changes deployed without piloting are organizational gambles — large bets on untested designs.
The experiment mindset
The difference between a pilot and a rollout is the difference between a hypothesis and a conclusion. A rollout says: "We know this change will work — implement it everywhere." A pilot says: "We believe this change will work — let us test it and learn."
The experiment mindset is the single most important attitude for effective systemic change. It acknowledges that system change is inherently uncertain — that the change agent's understanding of the system, however thorough, is incomplete. The pilot is the mechanism for converting uncertainty into knowledge: testing assumptions against reality, discovering gaps in the system map (Identify the system before trying to change it), and refining the intervention design based on actual rather than predicted consequences.
Eric Ries' Lean Startup methodology brought the experiment mindset to entrepreneurship, demonstrating that building a minimum viable product and testing it with real users produces better outcomes than building a complete product based on assumptions. The same principle applies to organizational systems: implementing a minimum viable change and testing it with real participants produces better outcomes than implementing a complete system redesign based on analysis alone (Ries, 2011).
Designing genuine experiments
A genuine pilot is designed to learn, not to confirm. This distinction is critical and frequently violated. Many organizational "pilots" are designed to succeed — to produce evidence that supports a decision already made. A genuine pilot is designed to test — to produce evidence that either supports or challenges the proposed change.
Selection without bias
The pilot context must be representative — similar enough to the broader organization that pilot results are generalizable. Selecting the best team, the most enthusiastic manager, or the least complex process guarantees pilot success while preventing organizational learning. The pilot result tells you nothing about whether the change will work in typical conditions — only that it works under ideal conditions.
The ideal pilot context is slightly challenging — a team that is average in performance, a process that is typical in complexity, a manager who is open but not enthusiastic. If the change works in this context, it will likely work broadly. If it fails, the failure reveals genuine design problems rather than implementation problems.
Measurement before measurement
The most common pilot error is beginning to measure after the pilot starts. Without baseline data — measurements of the same metrics before the change was implemented — the pilot results are uninterpretable. An improvement during the pilot could reflect the change's effect, the Hawthorne effect (improvement caused by attention rather than the change itself), seasonal variation, or coincidental external factors.
Establish baseline measurements for at least one full cycle of the process being changed before the pilot begins. This baseline provides the comparison against which pilot results are assessed.
Control conditions
The strongest pilot design includes a control — a comparable context that does not receive the change. By comparing the pilot context (change implemented) with the control context (change not implemented), the pilot can distinguish between changes caused by the intervention and changes caused by other factors operating in both contexts.
Perfect controls are rarely possible in organizational settings — unlike laboratory experiments, organizations cannot hold all variables constant. But approximate controls provide valuable evidence. If the pilot team's performance improves while a comparable non-pilot team's performance does not, the improvement is likely attributable to the change rather than to external factors.
Duration calibration
A pilot that is too short misses delayed effects. A pilot that is too long loses organizational attention and becomes a permanent special case rather than a genuine test. The appropriate duration depends on the cycle time of the process being changed.
For process changes (new workflows, new tools, new communication patterns), the pilot should run for at least three full cycles of the process. If the process cycles monthly, the pilot should run for at least three months. This duration allows for initial learning (cycle 1), stabilization (cycle 2), and genuine steady-state performance assessment (cycle 3).
For structural changes (reorganization, new roles, new reporting lines), the pilot should run for at least six months — long enough for the structural change to move past the initial disruption and settle into a new equilibrium.
Learning from pilots
The value of a pilot is not in the results — it is in the learning. Results tell you whether the change worked. Learning tells you why it worked or did not, what modifications would improve it, and what conditions are necessary for the change to succeed at scale.
What worked and why
For every positive result, understand the mechanism. If cycle time decreased, why? Was it the process change itself, the additional attention the pilot team received, the specific capabilities of the pilot team, or the absence of interference from adjacent systems (which would reappear at scale)? Understanding the mechanism enables the change to be replicated — designing the conditions that produced the result, not just the procedure.
What did not work and why
Pilot failures are more valuable than pilot successes — they reveal design flaws before those flaws are scaled organization-wide. For every negative result, trace the failure to its root cause. Was the design wrong (the change does not produce the intended effect)? Was the implementation wrong (the change was not implemented as designed)? Was the context wrong (the pilot context had unique conditions that prevented the change from working)?
What surprised
The most valuable pilot learning comes from surprises — outcomes that were neither intended nor predicted. Surprises reveal system dynamics that the original analysis missed. A change designed to improve speed that instead improves quality (a positive surprise) reveals a connection between speed and quality that was not in the system map. A change designed to empower frontline workers that instead overwhelms them (a negative surprise) reveals a capacity constraint that was not in the system map.
What is needed at scale
The pilot context is, by design, smaller and simpler than the full organization. Scaling the pilot requires understanding what additional infrastructure, training, resources, and organizational preparation are needed for the change to work at full scale. The pilot team may have succeeded because they received personal attention from the change leader, because they could resolve issues through direct communication that would require formal processes at scale, or because their context lacked interdependencies that larger contexts would include.
From pilot to scale
The decision to scale a pilot should be based on three criteria, all of which must be satisfied.
Evidence of effect. The pilot produced measurable improvement in the intended outcome, and the improvement is attributable to the change (not to attention effects, selection bias, or external factors).
Understanding of mechanism. The change team understands why the change worked — the specific mechanisms through which the change produced the improvement. This understanding is necessary for replicating the change in different contexts where the specific conditions may vary.
Scalability assessment. The change team has identified the modifications needed for the change to work at full scale — the additional infrastructure, resources, training, and organizational preparation required.
The Third Brain
Your AI system can help you design rigorous pilot programs. Describe the system change you want to test and ask: "Design a pilot program for this change. Specify: (1) the ideal pilot context and why, (2) the baseline measurements to collect before the pilot begins, (3) the pilot-period measurements (both intended outcome and unintended consequence indicators), (4) the appropriate control condition, (5) the pilot duration and why, (6) the decision criteria for scaling, modifying, or abandoning the change, and (7) the key questions the pilot should answer about scalability."
From experimentation to measurement
Pilots generate evidence — but evidence must be interpreted. The next lesson, Measuring systemic change, examines measuring systemic change — how to determine whether the system has actually changed or whether the apparent change is superficial, temporary, or illusory.
Sources:
- Ries, E. (2011). The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Crown Business.
Frequently Asked Questions