How do I apply the idea that the experiment backlog?

Create your experiment backlog right now. Open a document, spreadsheet, or note — whatever format you will actually maintain. Title it "Experiment Backlog" and create five columns or fields: Hypothesis (one sentence stating what you predict), Domain (which life area this targets — work, health,.

What goes wrong when you ignore that the experiment backlog?

The most common failure is treating the backlog as a to-do list — feeling pressure to run every experiment on it and experiencing guilt about the ones you never get to. A backlog is not a commitment device; it is a capture and prioritization tool. Its value comes from having more ideas than you.

How to ThinkIn the Age of AI

The experiment backlog

~15 min read·behavior·

behavior experimentation productivity knowledge-management prioritization

Core Primitive

Maintain a list of behavioral experiments you want to run.

The ideas you keep losing

You have more ideas for behavioral experiments than you could run in a year. You know this because they keep arriving — in the shower, during a podcast, while reading, in the middle of a conversation, at 2 AM when your brain decides to be creative instead of sleeping. Each one feels promising. Each one sparks that particular excitement of "what if I tried that?" And each one, if you are honest with yourself, has roughly the same fate: it floats in your awareness for a few hours or days, gets displaced by the next interesting idea, and eventually vanishes. Months later, you vaguely remember having an insight about how you might restructure your mornings or change your approach to difficult conversations, but the specifics are gone. The hypothesis you would have tested, the variables you would have controlled, the connection to something you read — all dissolved.

This is not a memory problem. It is an infrastructure problem. You have no system for holding experimental ideas between the moment of inspiration and the moment of execution. You are treating your experimental practice like a restaurant that takes only walk-in customers, serving whoever shows up at the door rather than managing reservations. The result is that your experiments are selected by recency and emotional salience — whichever idea happens to be freshest and most exciting when you finish your current experiment — rather than by strategic value. You are running the experiments you remember, not the experiments that matter most.

The primitive for this lesson is deceptively simple: maintain a list of behavioral experiments you want to run. But this is not a to-do list. It is a fundamentally different kind of cognitive tool — a prioritized queue that separates the act of generating experimental ideas from the act of choosing which ones to execute. That separation changes everything about how you experiment.

Why ideation and execution need different containers

The reason you keep losing experimental ideas is that you are trying to store them in the same mental space where you manage active experiments. This is like trying to use your kitchen counter as both a workspace and a pantry — the active work constantly displaces the stored items, and the stored items constantly clutter the active workspace.

Ken Schwaber and Jeff Sutherland formalized this insight in the Scrum framework with the concept of the product backlog — a single, ordered list of everything that might be done, maintained separately from the work currently in progress. The product backlog is not a plan. It is a holding environment for possibilities, ranked by the team's current understanding of value. The genius of the backlog concept is that it eliminates the false binary that most people face: either commit to doing something now or forget about it entirely. The backlog creates a third option — acknowledge the idea, assess its value, and store it in a place where it will be available when you are ready for it.

David Allen arrived at the same structural insight from a completely different direction. His Getting Things Done methodology is built on the principle that your mind is designed for having ideas, not for holding them. Every "open loop" — every idea, commitment, or possibility that you are trying to track mentally — occupies cognitive resources and generates a low-grade anxiety that Allen calls "psychic drag." The solution is not to execute everything immediately but to capture everything in a trusted external system. Once your brain knows that an idea has been reliably stored somewhere it can be retrieved, it releases the cognitive resources it was using to hold it. Allen's research with thousands of professionals showed that the simple act of capturing and organizing open loops produces an immediate and measurable reduction in stress and an increase in creative thinking.

For behavioral experiments, this principle is especially critical. An experiment you are vaguely thinking about running occupies a strange middle ground in your attention — not concrete enough to act on, not dismissed enough to forget. It creates what psychologist Bluma Zeigarnik documented as the Zeigarnik effect: incomplete tasks and unresolved intentions are more cognitively available than completed ones, intruding into attention at unpredictable moments. Every unrecorded experiment idea is an open loop that periodically interrupts your thinking with "I should really try that thing with my morning routine" — without actually moving you closer to testing it.

A backlog closes these loops. Each idea, once captured in a reliable external system, stops consuming mental bandwidth. Your brain can redirect that freed capacity toward the experiment you are actually running right now or toward generating even better ideas, confident that nothing will be lost.

What belongs in the backlog

An experiment backlog is not a wish list or a brainstorm dump. Each entry needs enough structure to be actionable when its time comes, but not so much structure that capturing a new idea becomes burdensome. The sweet spot is five fields per entry.

The first field is the hypothesis — a single sentence stating what you predict will happen if you run this experiment. "I predict that writing my three most important tasks the night before, rather than the morning of, will reduce my morning decision fatigue and allow me to start deep work fifteen minutes earlier." The hypothesis matters because it transforms a vague intention ("I should plan my days better") into a testable claim. When you eventually pull this experiment from the backlog, the hypothesis gives you a clear starting point — you know what you are testing, what you expect, and what would count as evidence for or against.

The second field is expected impact — your honest estimate of how much this experiment could improve your life if the hypothesis turns out to be true. This is not a precise calculation. It is a rough judgment: high impact means the behavior change, if successful, would meaningfully improve a domain you care about. Medium impact means it would produce a noticeable but moderate improvement. Low impact means it would be interesting to know but would not significantly change your daily experience. Impact assessment matters because it prevents you from spending your limited experimental bandwidth on trivial curiosities when transformative experiments are waiting in the queue.

The third field is estimated effort — how much disruption, setup, or willpower the experiment would require. A low-effort experiment might involve adding a five-minute practice to an existing routine. A high-effort experiment might require restructuring your entire morning, purchasing equipment, or coordinating with other people. Effort matters because it determines when an experiment is feasible, not just whether it is valuable. A high-impact, high-effort experiment might be exactly the right thing to run during a vacation week when your schedule is flexible, and exactly the wrong thing to attempt during a high-pressure work sprint.

The fourth field is domain — which area of your life this experiment targets. Work, health, relationships, cognition, finances, creativity, habits, environment. Domain tagging matters for two reasons. First, it lets you balance your experimental portfolio. If you notice that your last six experiments have all been about work productivity, your backlog might reveal high-value experiments in relationships or health that you have been unconsciously deprioritizing. Second, it helps you identify experiments with dependencies — a sleep experiment and an exercise experiment might interact in ways that make running them simultaneously problematic.

The fifth field is dependencies — other experiments or conditions that need to be in place before this one makes sense. "Test whether a standing desk reduces afternoon fatigue" probably depends on actually having a standing desk. "Test whether accountability partnerships improve exercise consistency" depends on having an accountability partner. Dependencies prevent you from pulling an experiment that you cannot actually execute, which would waste a slot in your active experiments and demoralize you in the process.

Teresa Amabile's research on creative productivity found that the most innovative teams maintain what she calls a "creative pipeline" — a reservoir of ideas at various stages of development. Teams that relied on generating ideas on demand, when they needed them, consistently produced lower-quality innovations than teams that drew from an existing pool of captured and refined ideas. The same principle applies to your personal experiments. Drawing from a backlog of well-specified ideas produces better experiments than generating them ad hoc when you happen to have a free slot.

How to prioritize: learning value over expected outcome

The most natural way to prioritize a backlog is by expected outcome — which experiment do you think will work best? This is almost always the wrong criterion. You do not know which experiments will work. If you did, you would not need to run them. Prioritizing by expected outcome biases your queue toward experiments that confirm your existing beliefs (you expect them to work because they align with what you already think) and away from experiments that could genuinely surprise you.

The better criterion is expected learning value — how much useful information will this experiment generate regardless of whether the hypothesis is confirmed or rejected? Eric Ries formalized this concept in the Lean Startup methodology as "validated learning," arguing that the purpose of an experiment is not to succeed but to learn. An experiment that fails cleanly — producing an unambiguous rejection of a specific hypothesis — is often more valuable than an experiment that vaguely succeeds, because it permanently eliminates a hypothesis from your consideration and sharpens your understanding of what actually drives the behavior you are investigating.

Consider two experiments in your backlog. Experiment A tests whether drinking an extra glass of water in the afternoon improves your energy. You are fairly confident this will help because you have read about it many times and it aligns with common health advice. Experiment B tests whether your afternoon energy dip is actually caused by a specific food you eat at lunch, not by dehydration at all. Experiment A has a higher expected success rate, but Experiment B has a higher expected learning value — it could reveal a causal mechanism you have never considered and redirect your entire approach to afternoon energy management.

When learning value is equal, break ties with an impact-to-effort ratio. High-impact, low-effort experiments should generally be run before low-impact, high-effort ones. This is the same two-by-two matrix used in product management to prioritize features — the upper-left quadrant (high value, low cost) is the obvious starting point, while the lower-right quadrant (low value, high cost) is the graveyard of experiments you will probably never run, and that is fine. Not everything in the backlog needs to be executed. The backlog's value includes the ideas you deliberately choose not to pursue, because that choice frees your capacity for the ones that matter most.

Information urgency is a third prioritization factor that sometimes overrides the others. Some experiments have a time window. If you are considering whether a particular conference is worth attending, you need to run that experiment before registration closes. If you are wondering whether cold-weather running works for you, that experiment has a seasonal deadline. Time-sensitive experiments get priority bumps regardless of their learning value, because their information becomes worthless if you wait too long.

Backlog grooming: keeping the queue alive

A backlog that is created once and never revisited is not a backlog. It is a fossil. The practice that keeps a backlog useful is regular grooming — a periodic review where you add new ideas, remove stale ones, update priorities, and refine entries that have become more or less relevant since you last looked at them.

Monthly grooming works well for most people. Set aside fifteen to twenty minutes once a month to walk through your entire backlog. For each entry, ask three questions. First, is this still interesting? Your priorities, circumstances, and knowledge evolve. An experiment that seemed fascinating three months ago might now feel irrelevant because you have learned something that makes the hypothesis moot, or because the life domain it targeted is no longer a priority. Remove or archive stale entries without guilt — they served their purpose by occupying a slot that prevented you from impulsively running them when a better experiment was available.

Second, has this entry changed priority? New information, completed experiments, and shifts in your life circumstances can all move an experiment up or down the queue. The sleep experiment that was medium priority last month might become urgent after you notice your energy declining. The relationship experiment that was high priority might drop after a conversation resolves the issue it was designed to address.

Third, does this entry need refinement? Ideas captured quickly often need sharpening before they are ready to execute. A backlog entry that says "test some kind of morning routine change" is too vague to act on. During grooming, refine it: "Test whether a specific sequence of movement, journaling, and planning — in that order — produces a more focused first work hour than my current unstructured morning." The grooming session is where rough ideas get polished into executable experiments.

Grooming also protects against a subtler failure: backlog bloat. Without periodic pruning, backlogs grow indefinitely. An overloaded backlog is psychologically indistinguishable from no backlog at all — you stop consulting it because scrolling through eighty entries feels more overwhelming than just picking whatever experiment comes to mind. Keep your active backlog between ten and twenty-five entries. Archive anything beyond that in a separate "someday" list that you review quarterly rather than monthly.

The backlog as motivational infrastructure

One of the less obvious benefits of maintaining an experiment backlog is what it does for your motivation between experiments. Without a backlog, the end of one experiment creates a void. You have to generate a new idea, design it, build enthusiasm for it, and overcome the inertia of starting something new — all at once. This gap is where many experimental practices stall. The transition cost between experiments is high enough that people take a break, the break extends, and the experimental habit dies.

A backlog eliminates this transition cost. When you finish an experiment — or when an experiment needs to be paused or abandoned — you do not face a blank page. You face a curated list of ideas you have already generated and prioritized, each one interesting enough to have earned its place in the queue. The next experiment is not something you need to invent. It is something you need to select. Selection is psychologically easier than invention because the creative work has already been done.

This is the same principle behind what Mihaly Csikszentmihalyi described in his research on flow states and creative work: creators who maintain ongoing lists of problems, questions, and project ideas report less creative drought and faster re-engagement after breaks than those who rely on spontaneous inspiration. The list functions as a bridge across the inevitable gaps in creative energy. Your experiment backlog serves the same bridging function, ensuring that a temporary dip in motivation does not permanently end your experimental practice.

The backlog also creates a specific kind of positive anticipation. When you are in the middle of a difficult or tedious experiment, knowing that several interesting alternatives are queued and waiting can be genuinely energizing. It reframes patience as strategic — you are not just enduring this experiment, you are completing it so you can move to the next one that excites you. This is different from the impulsive pull of a shiny new idea. It is the structured patience of someone who knows exactly what comes next and has deliberately chosen to finish what they started before beginning it.

Connecting the backlog to the experiment journal

Your experiment backlog and the experiment journal you built in Record experimental results form a natural feedback loop. The journal records what you tried and what happened. The backlog stores what you want to try next. But the connection goes deeper than simple sequencing.

Every completed experiment generates new backlog entries. The "next steps" field in your experiment journal — the sixth field of the recording format — is a direct input to your backlog. When an experiment on afternoon walking revealed that outdoor walks helped but indoor walks did not, the next step entry might have generated three new backlog items: "Test whether outdoor light exposure alone (standing outside without walking) captures the benefit," "Test whether a ten-minute outdoor walk delivers eighty percent of the benefit of a twenty-minute walk," and "Test whether afternoon outdoor walking improves sleep quality as a secondary effect." These entries go into the backlog, get prioritized alongside everything else, and wait their turn.

The backlog also enriches the journal. When you start an experiment that has been sitting in your backlog for two months, you bring more context to it than you would if you had invented it on the spot. The backlog entry has been groomed, refined, and compared against alternatives. You know why this experiment outranked the others. That clarity makes your hypothesis sharper, your experimental design cleaner, and your journal entries more precise.

Over time, the backlog and the journal together create a complete record of your experimental life — not just what you tested and what happened, but also what you considered testing and why you chose what you chose. That meta-record is invaluable for understanding your own experimental instincts. You might discover that you consistently avoid experiments in a particular domain, revealing a blind spot. You might notice that your highest-rated backlog items never get selected because something "urgent" always jumps the queue, revealing a pattern of reactive rather than strategic experimentation. The backlog makes these patterns visible in a way that ad hoc experimentation never can.

The Third Brain

Your experiment backlog becomes dramatically more powerful when you use AI as a backlog manager.

Feed your current backlog to an AI and ask it to help you prioritize. Describe your current circumstances — which life domains feel most pressing, how much experimental capacity you have, what experiments you have recently completed — and let the AI suggest a priority ordering. The AI will weigh factors that you might not consider simultaneously: that your last three experiments were all in the work domain and your health experiments have been deprioritized for months, that a low-effort experiment on your list could be stacked with a high-effort one if they target different domains, that two of your backlog entries are actually testing the same underlying mechanism and could be combined into a single more efficient experiment.

Use the AI as a brainstorming partner for backlog population. Describe a domain you want to experiment in and a general direction you are curious about, and let the AI generate ten possible experiments with well-formed hypotheses. You will likely reject seven, refine two, and add one directly — but that single experiment is one you would not have generated on your own because the AI explored possibilities outside your usual thinking patterns.

Perhaps most powerfully, give the AI both your experiment journal and your backlog and ask it to identify gaps — domains, question types, or experimental approaches that are missing from both. The AI can spot the absence of experiments about social behavior when all your entries are about individual productivity, or the absence of subtraction experiments (removing behaviors) when all your entries involve adding new ones. These gaps represent your experimental blind spots, and the AI is uniquely positioned to see them because it does not share your cognitive biases about what is and is not worth testing.

But maintain ownership of the final prioritization. The AI is an advisor, not a decision-maker. It does not know what you value most deeply, what you have the emotional energy to tackle right now, or what your intuition is whispering about a particular experiment. Let the AI inform your priority ordering. Do not let it dictate it.

From reactive to strategic experimentation

Without a backlog, your experimental practice is reactive. You run whatever experiment catches your attention, whenever the impulse strikes, for however long your initial enthusiasm lasts. This produces a scattered collection of half-finished experiments with no coherent thread connecting them — the behavioral equivalent of channel surfing.

With a backlog, your experimental practice becomes strategic. You generate ideas freely, store them reliably, prioritize them thoughtfully, and execute them deliberately. Each experiment is chosen from a curated queue based on learning value, feasibility, and current circumstances. Each completed experiment feeds new entries back into the backlog. The backlog grows, gets groomed, and evolves alongside you — a living document of your experimental intentions that ensures you are always testing the most important thing, not merely the most recent thing.

The previous lesson, Experimental ethics with yourself, established the ethical boundaries within which your experiments operate — ensuring that your experimental practice does not expose you to unnecessary harm. This lesson gives that ethically bounded practice a management layer: a system for generating more experiments than you can run and systematically choosing the best ones to pursue.

The next lesson, Sequential versus parallel experiments, addresses a decision that your backlog will immediately force you to confront. When you look at your prioritized queue and see three high-value experiments waiting, you will need to decide: should you run them sequentially, one at a time for cleaner results, or in parallel for faster throughput? That choice has significant implications for experimental validity, and your backlog — by making all your pending experiments visible at once — is what makes it a deliberate choice rather than an accidental one.

Build your backlog. Fill it with more ideas than you can execute in a year. Then choose, deliberately, which one comes next.

Practice

Build Your Experiment Backlog in Google Sheets

Create a structured, sortable backlog of behavioral experiments using Google Sheets to track hypotheses, prioritize by learning value, and schedule your first test.

15 minutesIntermediate

Method: Behavioral ExperimentTool: Google Sheets

1Open Google Sheets and create a new spreadsheet titled 'Experiment Backlog'. Create five column headers: Hypothesis, Domain, Estimated Effort, Expected Learning Value, and Status.
2Fill in at least seven behavioral experiments you want to test, writing one-sentence hypotheses like 'If I work standing for 2 hours daily, my afternoon focus will improve.' For each row, select a domain (work, health, relationships, cognition, finances, habits), rate effort and learning value as low/medium/high, and mark status as 'queued.'
3Click on the 'Expected Learning Value' column header, then select Data > Sort sheet > Sort by column (Expected Learning Value) Z→A to sort high-learning experiments to the top. Manually adjust rows with the same learning value to place lower-effort experiments higher.
4Use Google Sheets' highlighting tool (select cell, then paint bucket icon) to highlight the top three experiments in yellow, making them visually distinct from the rest of your backlog.
5Select one highlighted experiment you can start within seven days, add a new column called 'Start Date,' and enter a specific date (format: MM/DD/YYYY) in that experiment's row. Set a recurring monthly calendar reminder to spend 15 minutes grooming this sheet.

Frequently Asked Questions

Common questions about this lesson

Loading lessons

Preparing the next section of the lesson graph.