docs: improve README, MkDocs content, branding assets, and site polish

2026-04-19 00:18:42 +02:00
parent db79dd2d4f
commit 0b674a39a8
19 changed files with 1836 additions and 1698 deletions
--- a/docs/reduction-semantics.md
+++ b/docs/reduction-semantics.md
@@ -0,0 +1,208 @@
+# Reduction Semantics
+
+This document explains how **Radixor** decides that two subtrees are equivalent, how the different reduction modes work, and how those choices affect observable runtime behavior.
+
+## Why reduction exists
+
+Without reduction, the trie would still work, but many subtrees that mean the same thing would remain duplicated. The result would be a much larger runtime artifact than necessary.
+
+Reduction solves that by merging semantically equivalent subtrees into one canonical representative.
+
+The key idea is simple:
+
+> if two subtrees behave the same way under the semantic contract chosen for compilation, only one physical copy is needed.
+
+## Reduction is semantic, not merely structural
+
+Radixor does not reduce nodes merely because they look similar locally. It reduces subtrees only when their **meaning** matches according to the selected mode.
+
+That is why reduction is based on a **signature** that captures both:
+
+1. the local semantics of the current node,
+2. the structure and semantics of all descendant edges.
+
+Conceptually:
+
+```text
+Signature = (LocalDescriptor, SortedChildDescriptors)
+```
+
+Two subtrees are merged only if their signatures are equal.
+
+## Local descriptors
+
+The local descriptor defines what “equivalent” means for the values stored at one node.
+
+Radixor supports three semantic views.
+
+### Ranked descriptor
+
+The ranked descriptor preserves the full ordered result semantics of `getAll()`.
+
+That means:
+
+- candidate membership is preserved,
+- local ordering is preserved,
+- observable ranked multi-result behavior remains stable.
+
+This is the most semantically faithful mode when ambiguity handling matters.
+
+### Unordered descriptor
+
+The unordered descriptor preserves the set of reachable results, but not their local ordering.
+
+That means:
+
+- candidate membership is preserved,
+- ordering differences may be ignored,
+- more subtrees can be merged than in ranked mode.
+
+This mode is useful when alternative candidates matter but exact ranking does not.
+
+### Dominant descriptor
+
+The dominant descriptor focuses on the preferred result returned by `get()`.
+
+This mode is used only when the dominant local candidate is strong enough according to configured thresholds:
+
+- minimum winner percentage,
+- winner-over-second ratio.
+
+If that local dominance is not strong enough, Radixor does not force dominant semantics anyway. It falls back to ranked semantics for that node to avoid unsafe over-reduction.
+
+That fallback is one of the most important safeguards in the design.
+
+## Child descriptors
+
+A subtree is not defined only by the values stored at the current node. It is also defined by what behavior is reachable through its children.
+
+Each child contributes:
+
+```text
+(edge character, child signature)
+```
+
+Children are sorted by edge character so that signatures remain deterministic and stable.
+
+This matters because reduction must not depend on incidental map iteration order or other non-semantic implementation details.
+
+## Canonicalization
+
+Once a subtree signature is computed, the reduction process checks whether an equivalent canonical subtree already exists.
+
+If yes, the existing reduced node is reused.
+
+If no, a new canonical reduced node is created and registered.
+
+This turns reduction into a canonicalization process:
+
+- compute semantic identity,
+- find canonical representative,
+- reuse or create,
+- continue bottom-up.
+
+That is how Radixor eliminates duplicated equivalent subtrees.
+
+## Count aggregation and compiled state
+
+When multiple original build-time subtrees collapse into one canonical reduced node, local counts may be aggregated.
+
+This is an important point for understanding compiled artifacts.
+
+A compiled trie is not always a verbatim replay of original insertion history. It is a canonical runtime structure that preserves the semantics guaranteed by the chosen reduction mode.
+
+This explains two things:
+
+- why compiled artifacts can become dramatically smaller,
+- why reconstructing a builder from a compiled trie reflects the compiled state rather than the full original unreduced history.
+
+## Reduction modes
+
+### `MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS`
+
+This mode merges subtrees only when their `getAll()` results are equivalent for every reachable key suffix and when local ordering is preserved.
+
+Use this mode when:
+
+- ambiguity handling matters,
+- `getAll()` ordering should remain meaningful,
+- behavioral fidelity is more important than maximum compression.
+
+This is the safest and most generally recommended mode.
+
+### `MERGE_SUBTREES_WITH_EQUIVALENT_UNORDERED_GET_ALL_RESULTS`
+
+This mode also preserves `getAll()`-level membership equivalence for every reachable key suffix, but it ignores local ordering differences.
+
+Use this mode when:
+
+- alternative candidates still matter,
+- exact ordering is less important,
+- stronger reduction is acceptable.
+
+This mode is more aggressive than ranked mode, but less semantically rich.
+
+### `MERGE_SUBTREES_WITH_EQUIVALENT_DOMINANT_GET_RESULTS`
+
+This mode focuses on preserving dominant `get()` semantics for every reachable key suffix, subject to dominance thresholds.
+
+Use this mode when:
+
+- the main operational concern is the preferred result,
+- richer alternative-result behavior is less important,
+- stronger reduction is desirable.
+
+Because non-dominant nodes fall back to ranked semantics, this mode is not simply “discard everything except the winner”. It is a controlled reduction strategy with a built-in safety condition.
+
+## Practical effect on runtime behavior
+
+Reduction mode is not just a storage optimization setting. It affects what distinctions remain visible after compilation.
+
+### When ranked mode is used
+
+You can rely on full ranked `getAll()` semantics being preserved.
+
+### When unordered mode is used
+
+You can rely on candidate membership, but not necessarily on preserving the same local ranking distinctions.
+
+### When dominant mode is used
+
+You optimize primarily for preferred-result semantics. Alternative-result behavior may still exist, but it is no longer the primary semantic contract of the reduction.
+
+## Choosing a mode
+
+A practical rule of thumb is:
+
+- choose **ranked** if you are unsure,
+- choose **unordered** if alternative membership matters but ranking does not,
+- choose **dominant** only when your application is fundamentally driven by `get()` and you understand the trade-off.
+
+## Why this design works well
+
+The reduction model succeeds because it does not confuse “smaller” with “acceptable”.
+
+Instead, it makes the semantic contract explicit:
+
+- what exactly must be preserved,
+- what differences may be ignored,
+- when a more aggressive mode is safe,
+- when the system must fall back to a stricter interpretation.
+
+That explicitness is what makes the compression trustworthy.
+
+## Mental model to keep
+
+If you want one concise mental model for reduction, use this one:
+
+- build-time insertion collects examples,
+- reduction asks which subtrees mean the same thing,
+- the answer depends on the chosen semantic contract,
+- canonical representatives are shared,
+- the compiled trie preserves the behavior promised by that contract.
+
+## Continue with
+
+- [Architecture](architecture.md)
+- [Programmatic usage](programmatic-usage.md)
+- [CLI compilation](cli-compilation.md)