When a pattern reverses across subgroups, disaggregate before concluding — Simpson paradox
When a pattern appears to reverse across subgroups in your data, disaggregate by relevant context variables (sleep, stress, social setting) before drawing conclusions from the aggregate pattern.
Why This Is a Rule
Simpson's paradox — where a trend that appears in aggregate data reverses when you split by subgroup — is surprisingly common in personal analytics. Your aggregate data shows "caffeine improves productivity," but disaggregating by sleep quality reveals: on well-slept days, caffeine improves productivity; on poorly-slept days, caffeine makes it worse. The aggregate conflates the two because well-slept days are more common and you drink more caffeine then.
In personal data, the relevant subgroups are context variables: sleep quality, stress level, social setting, time of day, energy state, recent exercise. These variables moderate most personal correlations, meaning the aggregate pattern may not hold in any specific context — it's an average across contexts that individually behave differently.
The rule prescribes disaggregation before interpretation: when you notice a pattern, split the data by the most likely moderating variables before concluding that the aggregate pattern is real and actionable.
When This Fires
- When personal data analysis shows a clear correlation and you're about to act on it
- When a productivity or health pattern seems inconsistent — sometimes it holds, sometimes it doesn't
- When aggregate data contradicts your lived experience in specific situations
- Any personal analytics context where subgroup effects could reverse the aggregate trend
Common Failure Mode
Acting on aggregate patterns without checking subgroups. "Coffee helps my productivity" becomes a blanket rule, but on sleep-deprived days, coffee increases jitters and reduces focus. The aggregate truth hides the contextual lie. You're optimizing for the average case while your actual days are never average.
The Protocol
When you identify an aggregate pattern: (1) List 3-4 context variables that could moderate the pattern (sleep, stress, time of day, energy level). (2) Disaggregate: split your data by each variable. Does the pattern hold in both subgroups? (3) If it reverses in a subgroup → the pattern is context-dependent, not universal. Apply it only in the contexts where it holds. (4) If it holds across all subgroups → the pattern is robust. Act on it with higher confidence.