diff --git a/README.md b/README.md index 1a634e3..1291d0f 100644 --- a/README.md +++ b/README.md @@ -54,7 +54,7 @@ Radixor is especially attractive when you want something more adaptable than sim Radixor includes a JMH benchmark suite for both its own algorithmic core and a side-by-side English comparison against the Snowball Porter stemmer family. -On the current English comparison workload, Radixor with bundled `US_UK_PROFI` reaches approximately **31 to 32 million tokens per second**. Snowball original Porter reaches approximately **8 million tokens per second**, and Snowball English (Porter2) approximately **5 to 5.5 million tokens per second**. +On the current English comparison workload, Radixor with bundled `US_UK` reaches approximately **31 to 32 million tokens per second**. Snowball original Porter reaches approximately **8 million tokens per second**, and Snowball English (Porter2) approximately **5 to 5.5 million tokens per second**. That places Radixor at approximately: @@ -137,7 +137,7 @@ The repository keeps the front page concise and places detailed documentation un A practical first guide to loading, compiling, and using Radixor. - [Built-in Languages](docs/built-in-languages.md) - Overview of bundled language resources such as `US_UK` and `US_UK_PROFI`. + Overview of bundled language resources such as `US_UK`. - [Dictionary Format](docs/dictionary-format.md) How to write and normalize stemming dictionaries. diff --git a/docs/benchmarking.md b/docs/benchmarking.md index 8ee3b34..91bb8a9 100644 --- a/docs/benchmarking.md +++ b/docs/benchmarking.md @@ -13,7 +13,7 @@ The benchmark suite currently covers two categories: The comparison benchmark processes the same deterministic English token stream through: -- Radixor with bundled `US_UK_PROFI`, +- Radixor with bundled `US_UK` (older benchmark snapshots used the now-retired `US_UK_PROFI` resource), - Snowball original Porter, - Snowball English, commonly referred to as Porter2. @@ -37,7 +37,7 @@ For that reason, the published badge values should be treated primarily as a com A recent JMH run on JDK 21.0.10 with JMH 1.37, one thread, three warmup iterations, and five measurement iterations produced the following approximate throughput ranges: -| Workload | Radixor `US_UK_PROFI` | Snowball Porter | Snowball English | +| Workload | Radixor `US_UK` *(historical runs: `US_UK_PROFI`)* | Snowball Porter | Snowball English | | --- | ---: | ---: | ---: | | About 12,000 generated tokens | 30.99 M tokens/s | 8.21 M tokens/s | 5.46 M tokens/s | | About 60,000 generated tokens | 32.25 M tokens/s | 8.02 M tokens/s | 5.11 M tokens/s | @@ -83,7 +83,7 @@ The workload intentionally mixes: - simple inflections, - common derivational forms, - US and UK spelling families, -- lexical forms appropriate for `US_UK_PROFI`. +- lexical forms appropriate for the current bundled `US_UK` resource (with historical continuity from earlier `US_UK_PROFI` runs). This design keeps runs reproducible across environments and avoids accidental drift caused by changing external corpora. diff --git a/docs/programmatic-loading-and-building.md b/docs/programmatic-loading-and-building.md index 210da89..1cf6c5d 100644 --- a/docs/programmatic-loading-and-building.md +++ b/docs/programmatic-loading-and-building.md @@ -21,7 +21,7 @@ public final class BundledLanguageExample { public static void main(final String[] arguments) throws IOException { final FrequencyTrie trie = StemmerPatchTrieLoader.load( - StemmerPatchTrieLoader.Language.US_UK_PROFI, + StemmerPatchTrieLoader.Language.US_UK, true, ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS); } diff --git a/docs/quick-start.md b/docs/quick-start.md index 1970365..a8fac02 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -32,7 +32,7 @@ public final class BundledStemmerExample { public static void main(final String[] arguments) throws IOException { final FrequencyTrie trie = StemmerPatchTrieLoader.load( - StemmerPatchTrieLoader.Language.US_UK_PROFI, + StemmerPatchTrieLoader.Language.US_UK, true, ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS); @@ -104,7 +104,7 @@ public final class SingleStemExample { public static void main(final String[] arguments) throws IOException { final FrequencyTrie trie = StemmerPatchTrieLoader.load( - StemmerPatchTrieLoader.Language.US_UK_PROFI, + StemmerPatchTrieLoader.Language.US_UK, true, ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);