Test for structural identity, not vocabulary overlap, before merging similar notes
When considering whether to merge two similar notes, test whether the underlying structure is identical (same entities, same relationships, same claims) rather than whether the vocabulary overlaps, because structural identity warrants abstraction while surface similarity does not.
Why This Is a Rule
Two notes can use identical vocabulary while making completely different claims, and two notes can use completely different vocabulary while making the same structural claim. "Cognitive load limits working memory capacity" and "Bandwidth constraints limit network throughput" use different words but share an identical structure: [resource limitation] constrains [processing capacity]. These are candidates for abstraction. "Cognitive load limits working memory" and "Cognitive load affects learning strategies" share vocabulary but make different structural claims. Merging these would destroy a genuine distinction.
Surface similarity (keyword overlap) is what humans and AI both default to when identifying duplicates. But surface similarity conflates two unrelated notes that happen to use the same words while missing two genuinely identical notes that happen to use different words. The structural test — same entities, same relationships, same claims — catches actual duplication regardless of vocabulary.
This distinction prevents the two worst knowledge base errors: merging distinct ideas that happen to share vocabulary (destroying nuance) and keeping identical ideas that happen to use different words (accumulating redundancy).
When This Fires
- Semantic search surfaces two notes as "similar" and you're deciding whether to merge
- During knowledge base maintenance when reducing duplication
- When AI flags potential duplicates in your note system
- Any time two notes feel related and you're wondering if they're "the same thing"
Common Failure Mode
Merging notes because they use the same keywords. "Both notes mention cognitive load, so they must be about the same thing." But one note claims that cognitive load impairs decision-making and the other claims that cognitive load can be managed through chunking. These are complementary notes about the same topic, not duplicates — merging them would fuse two distinct claims into one confused note.
The Protocol
When considering whether to merge two similar notes: (1) Strip the domain-specific vocabulary from each. (2) Write the underlying structure of each: what entities are involved, what relationship is claimed between them, what the conclusion is. (3) Compare structures. If the structures are identical (same entities, same relationships, same claims in different words) → merge into a canonical abstraction. If the structures differ (different claims, different relationships, or different entities despite shared vocabulary) → keep separate and link instead.