docs: improve README, MkDocs content, branding assets, and site polish

This commit is contained in:
2026-04-19 00:18:42 +02:00
parent db79dd2d4f
commit 0b674a39a8
19 changed files with 1836 additions and 1698 deletions

208
docs/reduction-semantics.md Normal file
View File

@@ -0,0 +1,208 @@
# Reduction Semantics
This document explains how **Radixor** decides that two subtrees are equivalent, how the different reduction modes work, and how those choices affect observable runtime behavior.
## Why reduction exists
Without reduction, the trie would still work, but many subtrees that mean the same thing would remain duplicated. The result would be a much larger runtime artifact than necessary.
Reduction solves that by merging semantically equivalent subtrees into one canonical representative.
The key idea is simple:
> if two subtrees behave the same way under the semantic contract chosen for compilation, only one physical copy is needed.
## Reduction is semantic, not merely structural
Radixor does not reduce nodes merely because they look similar locally. It reduces subtrees only when their **meaning** matches according to the selected mode.
That is why reduction is based on a **signature** that captures both:
1. the local semantics of the current node,
2. the structure and semantics of all descendant edges.
Conceptually:
```text
Signature = (LocalDescriptor, SortedChildDescriptors)
```
Two subtrees are merged only if their signatures are equal.
## Local descriptors
The local descriptor defines what “equivalent” means for the values stored at one node.
Radixor supports three semantic views.
### Ranked descriptor
The ranked descriptor preserves the full ordered result semantics of `getAll()`.
That means:
- candidate membership is preserved,
- local ordering is preserved,
- observable ranked multi-result behavior remains stable.
This is the most semantically faithful mode when ambiguity handling matters.
### Unordered descriptor
The unordered descriptor preserves the set of reachable results, but not their local ordering.
That means:
- candidate membership is preserved,
- ordering differences may be ignored,
- more subtrees can be merged than in ranked mode.
This mode is useful when alternative candidates matter but exact ranking does not.
### Dominant descriptor
The dominant descriptor focuses on the preferred result returned by `get()`.
This mode is used only when the dominant local candidate is strong enough according to configured thresholds:
- minimum winner percentage,
- winner-over-second ratio.
If that local dominance is not strong enough, Radixor does not force dominant semantics anyway. It falls back to ranked semantics for that node to avoid unsafe over-reduction.
That fallback is one of the most important safeguards in the design.
## Child descriptors
A subtree is not defined only by the values stored at the current node. It is also defined by what behavior is reachable through its children.
Each child contributes:
```text
(edge character, child signature)
```
Children are sorted by edge character so that signatures remain deterministic and stable.
This matters because reduction must not depend on incidental map iteration order or other non-semantic implementation details.
## Canonicalization
Once a subtree signature is computed, the reduction process checks whether an equivalent canonical subtree already exists.
If yes, the existing reduced node is reused.
If no, a new canonical reduced node is created and registered.
This turns reduction into a canonicalization process:
- compute semantic identity,
- find canonical representative,
- reuse or create,
- continue bottom-up.
That is how Radixor eliminates duplicated equivalent subtrees.
## Count aggregation and compiled state
When multiple original build-time subtrees collapse into one canonical reduced node, local counts may be aggregated.
This is an important point for understanding compiled artifacts.
A compiled trie is not always a verbatim replay of original insertion history. It is a canonical runtime structure that preserves the semantics guaranteed by the chosen reduction mode.
This explains two things:
- why compiled artifacts can become dramatically smaller,
- why reconstructing a builder from a compiled trie reflects the compiled state rather than the full original unreduced history.
## Reduction modes
### `MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS`
This mode merges subtrees only when their `getAll()` results are equivalent for every reachable key suffix and when local ordering is preserved.
Use this mode when:
- ambiguity handling matters,
- `getAll()` ordering should remain meaningful,
- behavioral fidelity is more important than maximum compression.
This is the safest and most generally recommended mode.
### `MERGE_SUBTREES_WITH_EQUIVALENT_UNORDERED_GET_ALL_RESULTS`
This mode also preserves `getAll()`-level membership equivalence for every reachable key suffix, but it ignores local ordering differences.
Use this mode when:
- alternative candidates still matter,
- exact ordering is less important,
- stronger reduction is acceptable.
This mode is more aggressive than ranked mode, but less semantically rich.
### `MERGE_SUBTREES_WITH_EQUIVALENT_DOMINANT_GET_RESULTS`
This mode focuses on preserving dominant `get()` semantics for every reachable key suffix, subject to dominance thresholds.
Use this mode when:
- the main operational concern is the preferred result,
- richer alternative-result behavior is less important,
- stronger reduction is desirable.
Because non-dominant nodes fall back to ranked semantics, this mode is not simply “discard everything except the winner”. It is a controlled reduction strategy with a built-in safety condition.
## Practical effect on runtime behavior
Reduction mode is not just a storage optimization setting. It affects what distinctions remain visible after compilation.
### When ranked mode is used
You can rely on full ranked `getAll()` semantics being preserved.
### When unordered mode is used
You can rely on candidate membership, but not necessarily on preserving the same local ranking distinctions.
### When dominant mode is used
You optimize primarily for preferred-result semantics. Alternative-result behavior may still exist, but it is no longer the primary semantic contract of the reduction.
## Choosing a mode
A practical rule of thumb is:
- choose **ranked** if you are unsure,
- choose **unordered** if alternative membership matters but ranking does not,
- choose **dominant** only when your application is fundamentally driven by `get()` and you understand the trade-off.
## Why this design works well
The reduction model succeeds because it does not confuse “smaller” with “acceptable”.
Instead, it makes the semantic contract explicit:
- what exactly must be preserved,
- what differences may be ignored,
- when a more aggressive mode is safe,
- when the system must fall back to a stricter interpretation.
That explicitness is what makes the compression trustworthy.
## Mental model to keep
If you want one concise mental model for reduction, use this one:
- build-time insertion collects examples,
- reduction asks which subtrees mean the same thing,
- the answer depends on the chosen semantic contract,
- canonical representatives are shared,
- the compiled trie preserves the behavior promised by that contract.
## Continue with
- [Architecture](architecture.md)
- [Programmatic usage](programmatic-usage.md)
- [CLI compilation](cli-compilation.md)