docs: replace retired US_UK_PROFI with US_UK outside benchmarking history

This commit is contained in:
2026-04-26 12:32:13 +02:00
parent 1f5decd6ea
commit 128fa919f2
4 changed files with 8 additions and 8 deletions

View File

@@ -54,7 +54,7 @@ Radixor is especially attractive when you want something more adaptable than sim
Radixor includes a JMH benchmark suite for both its own algorithmic core and a side-by-side English comparison against the Snowball Porter stemmer family. Radixor includes a JMH benchmark suite for both its own algorithmic core and a side-by-side English comparison against the Snowball Porter stemmer family.
On the current English comparison workload, Radixor with bundled `US_UK_PROFI` reaches approximately **31 to 32 million tokens per second**. Snowball original Porter reaches approximately **8 million tokens per second**, and Snowball English (Porter2) approximately **5 to 5.5 million tokens per second**. On the current English comparison workload, Radixor with bundled `US_UK` reaches approximately **31 to 32 million tokens per second**. Snowball original Porter reaches approximately **8 million tokens per second**, and Snowball English (Porter2) approximately **5 to 5.5 million tokens per second**.
That places Radixor at approximately: That places Radixor at approximately:
@@ -137,7 +137,7 @@ The repository keeps the front page concise and places detailed documentation un
A practical first guide to loading, compiling, and using Radixor. A practical first guide to loading, compiling, and using Radixor.
- [Built-in Languages](docs/built-in-languages.md) - [Built-in Languages](docs/built-in-languages.md)
Overview of bundled language resources such as `US_UK` and `US_UK_PROFI`. Overview of bundled language resources such as `US_UK`.
- [Dictionary Format](docs/dictionary-format.md) - [Dictionary Format](docs/dictionary-format.md)
How to write and normalize stemming dictionaries. How to write and normalize stemming dictionaries.

View File

@@ -13,7 +13,7 @@ The benchmark suite currently covers two categories:
The comparison benchmark processes the same deterministic English token stream through: The comparison benchmark processes the same deterministic English token stream through:
- Radixor with bundled `US_UK_PROFI`, - Radixor with bundled `US_UK` (older benchmark snapshots used the now-retired `US_UK_PROFI` resource),
- Snowball original Porter, - Snowball original Porter,
- Snowball English, commonly referred to as Porter2. - Snowball English, commonly referred to as Porter2.
@@ -37,7 +37,7 @@ For that reason, the published badge values should be treated primarily as a com
A recent JMH run on JDK 21.0.10 with JMH 1.37, one thread, three warmup iterations, and five measurement iterations produced the following approximate throughput ranges: A recent JMH run on JDK 21.0.10 with JMH 1.37, one thread, three warmup iterations, and five measurement iterations produced the following approximate throughput ranges:
| Workload | Radixor `US_UK_PROFI` | Snowball Porter | Snowball English | | Workload | Radixor `US_UK` *(historical runs: `US_UK_PROFI`)* | Snowball Porter | Snowball English |
| --- | ---: | ---: | ---: | | --- | ---: | ---: | ---: |
| About 12,000 generated tokens | 30.99 M tokens/s | 8.21 M tokens/s | 5.46 M tokens/s | | About 12,000 generated tokens | 30.99 M tokens/s | 8.21 M tokens/s | 5.46 M tokens/s |
| About 60,000 generated tokens | 32.25 M tokens/s | 8.02 M tokens/s | 5.11 M tokens/s | | About 60,000 generated tokens | 32.25 M tokens/s | 8.02 M tokens/s | 5.11 M tokens/s |
@@ -83,7 +83,7 @@ The workload intentionally mixes:
- simple inflections, - simple inflections,
- common derivational forms, - common derivational forms,
- US and UK spelling families, - US and UK spelling families,
- lexical forms appropriate for `US_UK_PROFI`. - lexical forms appropriate for the current bundled `US_UK` resource (with historical continuity from earlier `US_UK_PROFI` runs).
This design keeps runs reproducible across environments and avoids accidental drift caused by changing external corpora. This design keeps runs reproducible across environments and avoids accidental drift caused by changing external corpora.

View File

@@ -21,7 +21,7 @@ public final class BundledLanguageExample {
public static void main(final String[] arguments) throws IOException { public static void main(final String[] arguments) throws IOException {
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load( final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
StemmerPatchTrieLoader.Language.US_UK_PROFI, StemmerPatchTrieLoader.Language.US_UK,
true, true,
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS); ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
} }

View File

@@ -32,7 +32,7 @@ public final class BundledStemmerExample {
public static void main(final String[] arguments) throws IOException { public static void main(final String[] arguments) throws IOException {
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load( final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
StemmerPatchTrieLoader.Language.US_UK_PROFI, StemmerPatchTrieLoader.Language.US_UK,
true, true,
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS); ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
@@ -104,7 +104,7 @@ public final class SingleStemExample {
public static void main(final String[] arguments) throws IOException { public static void main(final String[] arguments) throws IOException {
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load( final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
StemmerPatchTrieLoader.Language.US_UK_PROFI, StemmerPatchTrieLoader.Language.US_UK,
true, true,
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS); ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);