docs: replace retired US_UK_PROFI with US_UK outside benchmarking history
This commit is contained in:
@@ -54,7 +54,7 @@ Radixor is especially attractive when you want something more adaptable than sim
|
|||||||
|
|
||||||
Radixor includes a JMH benchmark suite for both its own algorithmic core and a side-by-side English comparison against the Snowball Porter stemmer family.
|
Radixor includes a JMH benchmark suite for both its own algorithmic core and a side-by-side English comparison against the Snowball Porter stemmer family.
|
||||||
|
|
||||||
On the current English comparison workload, Radixor with bundled `US_UK_PROFI` reaches approximately **31 to 32 million tokens per second**. Snowball original Porter reaches approximately **8 million tokens per second**, and Snowball English (Porter2) approximately **5 to 5.5 million tokens per second**.
|
On the current English comparison workload, Radixor with bundled `US_UK` reaches approximately **31 to 32 million tokens per second**. Snowball original Porter reaches approximately **8 million tokens per second**, and Snowball English (Porter2) approximately **5 to 5.5 million tokens per second**.
|
||||||
|
|
||||||
That places Radixor at approximately:
|
That places Radixor at approximately:
|
||||||
|
|
||||||
@@ -137,7 +137,7 @@ The repository keeps the front page concise and places detailed documentation un
|
|||||||
A practical first guide to loading, compiling, and using Radixor.
|
A practical first guide to loading, compiling, and using Radixor.
|
||||||
|
|
||||||
- [Built-in Languages](docs/built-in-languages.md)
|
- [Built-in Languages](docs/built-in-languages.md)
|
||||||
Overview of bundled language resources such as `US_UK` and `US_UK_PROFI`.
|
Overview of bundled language resources such as `US_UK`.
|
||||||
|
|
||||||
- [Dictionary Format](docs/dictionary-format.md)
|
- [Dictionary Format](docs/dictionary-format.md)
|
||||||
How to write and normalize stemming dictionaries.
|
How to write and normalize stemming dictionaries.
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ The benchmark suite currently covers two categories:
|
|||||||
|
|
||||||
The comparison benchmark processes the same deterministic English token stream through:
|
The comparison benchmark processes the same deterministic English token stream through:
|
||||||
|
|
||||||
- Radixor with bundled `US_UK_PROFI`,
|
- Radixor with bundled `US_UK` (older benchmark snapshots used the now-retired `US_UK_PROFI` resource),
|
||||||
- Snowball original Porter,
|
- Snowball original Porter,
|
||||||
- Snowball English, commonly referred to as Porter2.
|
- Snowball English, commonly referred to as Porter2.
|
||||||
|
|
||||||
@@ -37,7 +37,7 @@ For that reason, the published badge values should be treated primarily as a com
|
|||||||
|
|
||||||
A recent JMH run on JDK 21.0.10 with JMH 1.37, one thread, three warmup iterations, and five measurement iterations produced the following approximate throughput ranges:
|
A recent JMH run on JDK 21.0.10 with JMH 1.37, one thread, three warmup iterations, and five measurement iterations produced the following approximate throughput ranges:
|
||||||
|
|
||||||
| Workload | Radixor `US_UK_PROFI` | Snowball Porter | Snowball English |
|
| Workload | Radixor `US_UK` *(historical runs: `US_UK_PROFI`)* | Snowball Porter | Snowball English |
|
||||||
| --- | ---: | ---: | ---: |
|
| --- | ---: | ---: | ---: |
|
||||||
| About 12,000 generated tokens | 30.99 M tokens/s | 8.21 M tokens/s | 5.46 M tokens/s |
|
| About 12,000 generated tokens | 30.99 M tokens/s | 8.21 M tokens/s | 5.46 M tokens/s |
|
||||||
| About 60,000 generated tokens | 32.25 M tokens/s | 8.02 M tokens/s | 5.11 M tokens/s |
|
| About 60,000 generated tokens | 32.25 M tokens/s | 8.02 M tokens/s | 5.11 M tokens/s |
|
||||||
@@ -83,7 +83,7 @@ The workload intentionally mixes:
|
|||||||
- simple inflections,
|
- simple inflections,
|
||||||
- common derivational forms,
|
- common derivational forms,
|
||||||
- US and UK spelling families,
|
- US and UK spelling families,
|
||||||
- lexical forms appropriate for `US_UK_PROFI`.
|
- lexical forms appropriate for the current bundled `US_UK` resource (with historical continuity from earlier `US_UK_PROFI` runs).
|
||||||
|
|
||||||
This design keeps runs reproducible across environments and avoids accidental drift caused by changing external corpora.
|
This design keeps runs reproducible across environments and avoids accidental drift caused by changing external corpora.
|
||||||
|
|
||||||
|
|||||||
@@ -21,7 +21,7 @@ public final class BundledLanguageExample {
|
|||||||
|
|
||||||
public static void main(final String[] arguments) throws IOException {
|
public static void main(final String[] arguments) throws IOException {
|
||||||
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
|
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
|
||||||
StemmerPatchTrieLoader.Language.US_UK_PROFI,
|
StemmerPatchTrieLoader.Language.US_UK,
|
||||||
true,
|
true,
|
||||||
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ public final class BundledStemmerExample {
|
|||||||
|
|
||||||
public static void main(final String[] arguments) throws IOException {
|
public static void main(final String[] arguments) throws IOException {
|
||||||
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
|
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
|
||||||
StemmerPatchTrieLoader.Language.US_UK_PROFI,
|
StemmerPatchTrieLoader.Language.US_UK,
|
||||||
true,
|
true,
|
||||||
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
||||||
|
|
||||||
@@ -104,7 +104,7 @@ public final class SingleStemExample {
|
|||||||
|
|
||||||
public static void main(final String[] arguments) throws IOException {
|
public static void main(final String[] arguments) throws IOException {
|
||||||
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
|
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
|
||||||
StemmerPatchTrieLoader.Language.US_UK_PROFI,
|
StemmerPatchTrieLoader.Language.US_UK,
|
||||||
true,
|
true,
|
||||||
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user