Know when to stop optimizing

The optimization that never ends

You have already learned that optimization produces diminishing returns — each round of improvement yields less than the last. But knowing that the curve flattens is not the same as knowing where to step off it. Diminishing returns describes the shape of the territory. This lesson is about building the compass that tells you when to stop walking.

The absence of a stopping rule is one of the most expensive cognitive failures in personal and professional life. It does not look like failure. It looks like diligence, craftsmanship, high standards. The person who spends six hours perfecting a slide deck appears dedicated. The engineer who rewrites a function for the fourth time appears thorough. The writer who edits a paragraph for the twentieth time appears committed to quality. But dedication without a stopping criterion is just expensive indecision wearing the mask of excellence.

The question is not whether you should optimize. You should. The question is: how do you know when you have optimized enough?

The mathematics of when to stop: the 37 percent rule

Mathematics has a precise answer to at least one version of this question, and its structure illuminates the general principle even when exact calculation is impossible.

The secretary problem, first formalized in the 1960s and explored extensively by mathematicians including Merrill Flood and Herbert Robbins, poses a clean scenario. You are interviewing candidates for a position. You see them one at a time, in random order. After each interview, you must immediately accept or reject — no callbacks. How do you maximize your chance of selecting the best candidate?

The optimal solution, proven mathematically, is the look-then-leap rule: spend the first 37 percent of your candidate pool gathering information without selecting anyone, then immediately select the next candidate who is better than everyone you have seen so far. This strategy, as Brian Christian and Tom Griffiths explain in Algorithms to Live By (2016), selects the single best candidate approximately 37 percent of the time — which is the best any strategy can achieve under these constraints. The number 37 comes from 1/e, where e is the base of the natural logarithm.

The specific percentage matters less than the structural insight: the optimal strategy has a built-in stopping point. It does not say "keep looking until you find the perfect candidate." It says "gather information for a defined period, then commit to the first option that clears a threshold." The strategy accepts that you will sometimes miss the absolute best option. That acceptance is not a flaw — it is the mechanism that makes the strategy optimal. Refusing to accept less-than-perfect outcomes is what makes a strategy worse, not better.

This is the deep lesson of optimal stopping theory: the mathematically best approach to finding the best outcome involves deliberately choosing to stop before you have examined all possibilities. Exhaustive search is not optimal. It is a failure mode.

Satisficing: the psychology of good enough

Herbert Simon, the Nobel Prize-winning economist and cognitive scientist, arrived at the same conclusion from a completely different direction. In the 1950s, Simon introduced the concept of bounded rationality — the observation that human beings do not have infinite time, infinite information, or infinite cognitive capacity for making decisions. Classical economics assumed rational agents would optimize: evaluate all options, compute expected utilities, select the maximum. Simon demonstrated this was both empirically false and theoretically unnecessary.

His alternative was satisficing, a term he coined by combining "satisfy" and "suffice." A satisficer does not search for the best possible option. They define a threshold — a set of criteria that constitute "good enough" — and select the first option that meets it. The satisficer searching for an apartment does not visit every listing in the city. They define their requirements (location, price, size, condition), visit apartments until one meets all requirements, and sign the lease.

Simon's insight was that satisficing is not laziness or settling. It is the rational response to the reality of bounded resources. The time and energy spent searching for the optimal apartment could have been spent on work, relationships, or rest — all of which have value. The satisficer recognizes that the marginal improvement from continued searching is smaller than the marginal cost of the search itself.

Barry Schwartz, in The Paradox of Choice (2004), extended Simon's framework with empirical research on maximizers versus satisficers. His findings, published across seven studies with Sheena Iyengar and others, revealed a consistent and striking pattern: maximizers — people who habitually seek the best possible option — reported lower happiness, lower life satisfaction, lower optimism, and higher rates of depression and regret compared to satisficers. The maximizer who exhaustively compares every available option and finally selects the objectively best one ends up less satisfied with their choice than the satisficer who picked the first good-enough option.

This is not a paradox. It is a cost-accounting problem. The maximizer's strategy imposes cognitive and emotional costs — the anxiety of comparison, the regret of imagined alternatives, the exhaustion of extended deliberation — that exceed the marginal value of the superior option. Schwartz demonstrated that maximizing is not just inefficient. It is anti-correlated with wellbeing. The person who refuses to stop optimizing ends up worse off than the person who stops at good enough.

Exploration versus exploitation: the structural tradeoff

Computer science formalizes the optimization stopping problem through the exploration-exploitation tradeoff — one of the foundational dilemmas in reinforcement learning, operations research, and decision theory.

The dilemma is this: at any point, you can either exploit what you already know (use your current best option) or explore for something potentially better (invest time searching for new options). Every moment spent exploring is a moment not spent exploiting, and vice versa.

The multi-armed bandit problem, named after a gambler facing a row of slot machines with different unknown payoff rates, captures this tension precisely. The gambler must decide when to stop trying new machines (exploring) and start pulling the lever on the best machine found so far (exploiting). The critical factor is the time horizon. If a thousand pulls remain, exploration is valuable — information gained could improve hundreds of future decisions. If only ten pulls remain, exploration is expensive — there is not enough time to recoup the cost of new information.

This maps directly to optimization decisions. Early in a project, exploring different approaches has high value because you will live with the chosen approach for a long time. Close to a deadline, exploring alternatives has low value because you cannot recoup the switching cost. The practical rule: as your remaining time horizon shrinks, shift from exploring to exploiting. The engineer who keeps optimizing architecture two weeks before launch is exploring when they should be exploiting. The transition from exploration to exploitation is the structural definition of "knowing when to stop."

Voltaire's ancient warning, Simon's modern proof

The aphorism "the perfect is the enemy of the good" predates modern decision theory by centuries. Voltaire wrote it in 1770, quoting an Italian proverb — "il meglio e l'inimico del bene" — that had been documented as early as 1603. Shakespeare expressed the same idea in King Lear in 1606: "Striving to better, oft we mar what's well." The observation is old because the failure mode is old. Humans have been over-optimizing for as long as they have been optimizing.

But the proverb, repeated often enough, becomes wallpaper. People nod at it and keep polishing something that was finished three iterations ago. What modern research adds to Voltaire's observation is not the insight itself but the mechanism behind it. Simon explained why the perfect is the enemy of the good: bounded rationality means the cost of pursuing perfection always exceeds the value of attaining it. Schwartz demonstrated how it operates psychologically: through the hedonic penalty of maximizing. Optimal stopping theory provides when to stop: after a calculable threshold that depends on the option space and the cost of continued search.

The ancient wisdom told you to stop. The modern research tells you how to build a stopping rule you can actually follow.

Early stopping: the machine learning metaphor made literal

Machine learning provides the most concrete and instructive model of optimization stopping. When training a neural network, you optimize the model's parameters to minimize error on training data. But there is a critical problem: if you optimize for too long, the model begins to overfit. It memorizes the noise and idiosyncrasies of the training data instead of learning the underlying patterns. Performance on training data keeps improving while performance on new, unseen data starts to degrade.

The solution is early stopping — a regularization technique where you monitor the model's performance on a separate validation dataset during training. When validation performance stops improving and begins to deteriorate, you stop training, even though training performance is still getting better. You accept a model that is not maximally optimized on its training data because that over-optimized model would perform worse on the data that actually matters.

The parallel to personal and professional optimization is exact. Your current project is the training data. Your life — the other projects, relationships, and opportunities waiting for your attention — is the validation set. Over-optimizing on the current project (training data) degrades your performance on the rest of your life (validation data). The model that memorizes its training data looks impressive in isolation but fails in deployment. The person who perfects one deliverable while everything else deteriorates looks dedicated in isolation but is failing at the system level.

Early stopping works because it operationalizes a stopping criterion that is external to the optimization process itself. You do not decide to stop based on whether the training loss is still decreasing — it always is. You decide to stop based on whether a separate metric, one that represents your actual goal, has plateaued. This is the template for personal stopping rules: define a metric that represents the actual purpose of the optimization, not the optimization itself, and stop when that metric stabilizes.

Building your stopping rules

Theory is necessary but not sufficient. You need concrete stopping rules you can apply in practice. A stopping rule has three components: a threshold, a metric, and a trigger.

The threshold answers: what does good enough look like for this specific context? Not good enough in the abstract — good enough for the actual purpose this optimization serves. A presentation to your team has a different threshold than a keynote at a conference. A prototype has a different threshold than a production system. A journal entry has a different threshold than a published article. Defining the threshold requires you to be honest about the actual stakes, not the stakes your perfectionism invents.

The metric answers: how will I measure whether I have crossed the threshold? The metric must be external to the optimization process. "Does this feel polished enough?" is not a metric — it is an emotion, and it will never stabilize because your standards rise to meet your output. "Does this communicate the three key points clearly to someone unfamiliar with the topic?" is a metric. "Will this run under the five-minute window the analysts need?" is a metric. Metrics are concrete, verifiable, and connected to the purpose the optimization serves.

The trigger answers: what happens when the metric is met? This is where most stopping rules fail. People define thresholds and metrics but do not define what occurs at the boundary. A functional trigger is a specific next action: "When the deck communicates the three key points clearly, I close the file and move to code review." "When the pipeline runs under five minutes, I commit the code and open the next ticket." The trigger converts the stopping rule from an intention into a behavior.

Without all three components, the stopping rule is incomplete and will not hold under the gravitational pull of continued optimization. Without a threshold, you optimize indefinitely. Without a metric, you cannot tell when the threshold is met. Without a trigger, you acknowledge the threshold is met and keep optimizing anyway.

The opportunity cost that optimization hides

The deepest reason to stop optimizing is not about the thing you are optimizing. It is about everything else.

Every hour spent on the seventh draft of a report is an hour not spent on the first draft of the next report. Every day spent shaving seconds off a pipeline is a day not spent building the feature that opens a new market. Optimization has an obvious cost — the time and energy it consumes. But it also has a hidden cost that is almost always larger: the foreclosed alternatives.

Economists call this opportunity cost — the value of the best alternative you did not pursue. The insidious property of optimization is that it makes opportunity cost invisible. When you are deep in refinement, you can see the improvement you are making. You cannot see the improvements you are not making elsewhere because you are not looking at elsewhere. The optimization narrows your attention precisely when broad attention would serve you better.

This is why stopping rules must be set before you begin optimizing, not in the moment of optimization itself. In the moment, the local gradient always points toward more refinement. Only the global view reveals that the gradient you should follow leads to a different project entirely.

From stopping to testing

You now have a framework for recognizing when to stop: the mathematical structure of optimal stopping, the psychological research on satisficing versus maximizing, the computational metaphor of early stopping, and a practical template for building stopping rules with thresholds, metrics, and triggers. You understand that stopping at good enough is not a compromise — it is an optimization of your total system, not just the component you happen to be touching.

But optimization decisions are not always about when to stop. Sometimes they are about how to choose between two options that both seem good enough. When you have refined an agent to a satisfactory level and identified a possible alternative approach, you face a different question: not "should I keep optimizing?" but "which version should I keep?" That is the domain of A/B testing — running controlled comparisons to determine which of two options performs better — and it is what L-0566 addresses.

Sources:

Christian, B. & Griffiths, T. (2016). Algorithms to Live By: The Computer Science of Human Decisions. Henry Holt and Company. (Secretary problem, 37% rule, look-then-leap)
Simon, H. A. (1956). "Rational choice and the structure of the environment." Psychological Review, 63(2), 129-138. (Bounded rationality, satisficing)
Schwartz, B. (2004). The Paradox of Choice: Why More Is Less. Harper Perennial. (Maximizers vs. satisficers)
Schwartz, B., Ward, A., Monterosso, J., Lyubomirsky, S., White, K., & Lehman, D. R. (2002). "Maximizing versus satisficing: Happiness is a matter of choice." Journal of Personality and Social Psychology, 83(5), 1178-1197. (Seven-study empirical findings)
Voltaire (1770). La Begueule. ("Le mieux est l'ennemi du bien")
Prechelt, L. (1998). "Early Stopping — But When?" Neural Networks: Tricks of the Trade, Springer. (Early stopping regularization)
Mehlhorn, K., et al. (2015). "Unpacking the Exploration-Exploitation Tradeoff: A Synthesis of Human and Animal Literatures." Decision, 2(3), 191-215. (Exploration-exploitation framework)