How do I apply the idea that measuring systemic change?

For a recent change in your organization, assess whether the system actually changed by applying three tests: (1) The attention test — does the improved outcome persist when leadership attention moves to other priorities? If performance reverts when the spotlight moves, the system did not change —.

What goes wrong when you ignore that measuring systemic change?

Measuring only the intended outcome and ignoring system health indicators. A change that produces the intended outcome while degrading system health (increasing burnout, reducing morale, creating technical debt, eroding trust) has not improved the system — it has traded one problem for another..

How to ThinkIn the Age of AI

Measuring systemic change

~6 min read·organization·

organization systems measurement systemic-change change-measurement organizational-metrics

Core Primitive

Define how you will know the system has actually changed, not just appeared to change. Systemic change is real only when the system produces different outcomes under normal operating conditions — without extra attention, heroic effort, or temporary workarounds. Many change efforts produce initial improvements that fade as the organizational attention moves elsewhere, revealing that the system itself did not change — only the effort level did. Measuring systemic change requires distinguishing between surface changes (different activities within the same system) and structural changes (different system dynamics that produce different outcomes naturally).

The improvement illusion

Almost every organizational change produces initial improvement. The Hawthorne effect — the phenomenon where people modify their behavior in response to being observed — ensures that any intervention with visibility produces short-term behavioral change. The novelty of a new system, the attention of leadership, and the energy of a change initiative all produce temporary performance gains that have nothing to do with the system change itself.

The question is not whether the change produced improvement. It is whether the improvement will persist when the novelty fades, the attention shifts, and the energy dissipates. If it does, the system has changed. If it does not, only the effort level changed — and effort is not sustainable.

Eliyahu Goldratt distinguished between "necessary conditions" (things that must be true for a system to function) and "sufficient conditions" (things that, when true, guarantee the outcome). Most change measurements track necessary conditions (the new process is being followed, the new tool is being used) without verifying sufficient conditions (the new process produces better outcomes under normal conditions, the new tool improves decisions without additional support) (Goldratt, 1990).

What to measure

Effective measurement of systemic change operates at four levels, each providing different evidence about whether the system has genuinely changed.

Level 1: Activity metrics

Activity metrics track whether the new behaviors are occurring: adoption rates, usage frequency, compliance percentages. These metrics answer the question: "Are people doing the new thing?"

Activity metrics are necessary but insufficient. High adoption of a new tool does not mean the tool is improving outcomes — it might mean the tool is mandatory and people are using it grudgingly while finding workarounds for its limitations. Activity metrics detect non-adoption (a clear signal that the change is not working) but cannot confirm that adoption is producing the intended benefits.

Level 2: Output metrics

Output metrics track whether the intended outcomes are changing: processing time, defect rates, customer satisfaction, revenue, cycle time. These metrics answer the question: "Are we getting the result we wanted?"

Output metrics are more meaningful than activity metrics but still insufficient. An output improvement might be produced by the system change, by coincidental external factors, by temporary additional resources, or by the Hawthorne effect. Output metrics that improve initially and then revert indicate that the system did not change — something else (attention, effort, resources) produced the temporary improvement.

Level 3: System health metrics

System health metrics track whether the system is functioning well while producing the output: employee engagement, error rates, workaround frequency, escalation volume, technical debt accumulation, capacity utilization. These metrics answer the question: "Is the system healthy while producing this result?"

A system that produces better outputs while degrading in health is borrowing from the future — it is producing short-term improvement by depleting long-term capability. System health metrics detect this borrowing, which output metrics alone miss.

Level 4: Persistence metrics

Persistence metrics track whether the improvement holds over time and under varying conditions: performance during low-attention periods, performance under stress, performance after personnel changes, performance during organizational disruptions. These metrics answer the question: "Is this a genuine system change or a temporary improvement?"

Persistence metrics are the definitive test of systemic change. A genuine system change produces stable performance because the system itself — not individual effort or management attention — sustains the outcome. Persistence metrics require patience: they can only be assessed over months, not weeks.

The three tests of genuine change

Three practical tests distinguish genuine systemic change from temporary improvement.

The attention test

Remove the leadership attention that accompanied the change. Stop the weekly check-ins, the dashboard reviews, the status updates. Does performance persist?

If performance reverts when attention shifts, the change was attention-dependent — the system did not change; the monitoring did. This is the most common failure mode in organizational change: the change initiative produces improvement through increased oversight, and the improvement lasts exactly as long as the oversight does.

The attention test should be applied six to twelve months after implementation, when the change initiative has lost its novelty and leadership attention has naturally shifted to other priorities.

The personnel test

Would the improved outcome survive the departure of the key people who drove the change? If the change champion leaves, does the system revert?

If performance depends on specific individuals, the system did not change — the individuals compensated for the system's deficiencies through personal effort. Genuine system change is person-independent: the system produces the desired outcome regardless of which specific individuals operate within it (within the normal range of competence).

The stress test

Apply stress to the system: increase the workload, shorten the timeline, reduce the resources. Does performance degrade gracefully (indicating a resilient system) or collapse to pre-change levels (indicating a fragile change)?

Systems under stress reveal their true operating logic. A team that follows the new process under normal conditions but reverts to the old process under pressure has not internalized the system change — they have learned the new process as an additional behavior layered on top of the old system, and the old system reasserts itself when the additional effort required to maintain the new behavior becomes unsustainable.

Leading and lagging indicators

Effective measurement combines leading indicators (early signals that predict future outcomes) with lagging indicators (confirmed outcomes that verify past changes).

Leading indicators provide early warning: adoption rates, behavior changes, attitude shifts, process compliance, workaround frequency. These indicators predict whether the system change will produce the intended outcome — or whether adjustments are needed before the lagging indicators confirm failure.

Lagging indicators provide confirmation: outcome improvements, customer satisfaction changes, financial results, retention rates. These indicators confirm whether the system change produced the intended result — but they appear months after the change, which is too late for correction if the change was misdesigned.

The optimal measurement system uses leading indicators for real-time course correction and lagging indicators for definitive assessment. If leading indicators are positive but lagging indicators are negative, the leading indicators were measuring the wrong things. If leading indicators are negative but lagging indicators are positive, the change is succeeding through mechanisms the measurement system did not anticipate — which is valuable learning about how the system actually works.

The Third Brain

Your AI system can help you design comprehensive measurement frameworks for systemic change. Describe the change you have implemented or are planning, and ask: "Design a four-level measurement framework: activity metrics (are people doing the new thing?), output metrics (are we getting the result we wanted?), system health metrics (is the system healthy while producing the result?), and persistence metrics (will the improvement hold over time?). For each level, specify the specific metrics, the data sources, the measurement frequency, and the thresholds that would indicate success, concern, or failure. Also identify the leading indicators that would provide early warning of problems before the lagging indicators confirm them."

From measurement to mechanism

Measurement tells you whether the system changed. It does not tell you how to change it. The remaining lessons of this phase examine the specific mechanisms of systemic change — the structural, incentive, informational, and process levers that change agents can use to redesign organizational systems.

The next lesson, Structural change versus behavioral change, examines the distinction between structural change and behavioral change — two fundamentally different approaches to systemic intervention, each with different strengths, limitations, and measurement requirements.

Sources:

Goldratt, E. M. (1990). The Theory of Constraints. North River Press.

Practice

Map System Change Persistence Tests in Miro

Create a visual systems map in Miro that documents a recent organizational change through three persistence tests (attention, personnel, and stress), distinguishing surface changes from structural changes. This practice builds your capacity to diagnose whether system dynamics have actually shifted or whether temporary effort masked unchanged underlying structures.

15 minutesAdvanced

Method: Systems MappingTool: Miro

1Open Miro and create a new board titled 'System Change Assessment: [Your Change Initiative]'. Create three vertical swim lanes labeled 'Attention Test', 'Personnel Test', and 'Stress Test' using Miro's frame tool.
2In the Attention Test lane, use Miro sticky notes to map the change outcomes timeline: place green notes for periods when leadership focused on this change and yellow notes for periods when attention shifted elsewhere. Add red connector arrows showing whether performance metrics declined when attention moved, documenting whether the system reverted without spotlight.
3In the Personnel Test lane, create Miro cards identifying key individuals who championed the change. Use Miro's connector lines to link each person to specific outcomes they enabled. Add a text box assessing: 'Would these outcomes persist if these individuals left tomorrow?' Document dependencies that reveal heroism filling gaps rather than system redesign.
4In the Stress Test lane, map three stress scenarios your organization recently faced (high workload period, tight deadline, resource constraint) using Miro shapes. For each scenario, add sticky notes describing whether the changed outcomes persisted or reverted under pressure, revealing whether extra capacity masked unchanged system dynamics.
5At the bottom of your Miro board, create a summary section with three verdict boxes: mark each test as 'PASSED' (green) if outcomes persisted or 'FAILED' (red) if they reverted. Add a final text box stating your overall assessment: 'Surface Change' (if any test failed) or 'Structural Change' (if all tests passed), with one sentence explaining the core system dynamic that did or did not shift.

Frequently Asked Questions

Common questions about this lesson

Loading lessons

Preparing the next section of the lesson graph.