feat: Prepare TrieMetadata and new stemmer data integration

This commit is contained in:
2026-04-23 20:21:46 +02:00
parent a9d15fa3ae
commit 4d939f5b6e
77 changed files with 3024 additions and 179778 deletions

View File

@@ -15,7 +15,7 @@ A compiled stemmer can be obtained in three common ways.
### Use a bundled language dictionary
Radixor ships with bundled dictionaries for a set of supported languages. These resources are line-oriented dictionaries stored with the library and compiled into a `FrequencyTrie<String>` when loaded. The loader can also store the canonical stem itself as a no-op patch command.
Radixor ships with bundled dictionaries for a set of supported languages. These resources are line-oriented dictionaries stored with the library and compiled into a `FrequencyTrie<String>` when loaded. The loader can also store the canonical stem itself as a no-op patch command. Compiled trie artifacts now persist self-describing metadata, including the traversal direction and compilation reduction settings used to build the artifact.
```java
import java.io.IOException;
@@ -202,3 +202,8 @@ Dictionary compilation is usually a one-time preparation step and is generally f
- [CLI compilation](cli-compilation.md)
- [Built-in languages](built-in-languages.md)
- [Architecture and reduction](architecture-and-reduction.md)
## Persisted trie metadata
Every compiled trie artifact stores a `TrieMetadata` descriptor together with the immutable trie payload. That metadata currently records the binary format version, the `WordTraversalDirection`, the `ReductionSettings` used during compilation, and the declared `DiacriticProcessingMode`. Even when a given release does not yet actively branch on every field at query time, persisting the full descriptor keeps artifacts self-describing and prepares the format for future matching strategies without relying on side-channel configuration.