feat: add JMH comparison benchmarks for Radixor vs Snowball Porter stemmers
build: isolate Snowball benchmark integration into dedicated Gradle script docs: highlight benchmarked throughput advantage in README docs: add detailed benchmarking guide and execution notes
This commit is contained in:
134
docs/benchmarking.md
Normal file
134
docs/benchmarking.md
Normal file
@@ -0,0 +1,134 @@
|
||||
# Benchmarking
|
||||
|
||||
> ← Back to [README.md](../README.md)
|
||||
|
||||
Radixor includes a JMH benchmark suite for both the internal algorithmic core and a side-by-side English comparison against the Snowball Porter stemmer family.
|
||||
|
||||
This document explains what is benchmarked, how to run it, and how to interpret the results responsibly.
|
||||
|
||||
## Scope
|
||||
|
||||
The benchmark suite currently covers two categories:
|
||||
|
||||
- Radixor core operations
|
||||
- English stemmer comparison on the same token workload
|
||||
|
||||
The comparison benchmark processes the same deterministic English token stream through:
|
||||
|
||||
- Radixor with bundled `US_UK_PROFI`
|
||||
- Snowball original Porter
|
||||
- Snowball English, commonly referred to as Porter2
|
||||
|
||||
The purpose of the comparison is throughput measurement on identical input. It is not intended to prove linguistic equivalence between the compared stemmers.
|
||||
|
||||
## Current snapshot
|
||||
|
||||
A recent JMH run on JDK 21.0.10 with JMH 1.37, one thread, three warmup iterations, and five measurement iterations produced the following approximate throughput ranges:
|
||||
|
||||
| Workload | Radixor `US_UK_PROFI` | Snowball Porter | Snowball English |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| About 12,000 generated tokens | 30.99 M tokens/s | 8.21 M tokens/s | 5.46 M tokens/s |
|
||||
| About 60,000 generated tokens | 32.25 M tokens/s | 8.02 M tokens/s | 5.11 M tokens/s |
|
||||
|
||||
On that workload, Radixor is approximately:
|
||||
|
||||
- 4 times faster than Snowball original Porter
|
||||
- 6 times faster than Snowball English
|
||||
|
||||
These values are workload- and environment-dependent. Treat them as measured results for the documented benchmark setup, not as universal constants.
|
||||
|
||||
## Benchmark classes
|
||||
|
||||
The main benchmark classes are under `src/jmh/java/org/egothor/stemmer/benchmark`.
|
||||
|
||||
Relevant classes include:
|
||||
|
||||
- `FrequencyTrieLookupBenchmark`
|
||||
- `FrequencyTrieCompilationBenchmark`
|
||||
- `EnglishStemmerComparisonBenchmark`
|
||||
|
||||
The English comparison benchmark uses the bundled Radixor English resource and the official Snowball Java distribution integrated into the JMH source set.
|
||||
|
||||
## Workload design
|
||||
|
||||
The English comparison benchmark uses a deterministic generated corpus rather than an uncontrolled ad hoc text sample.
|
||||
|
||||
The workload intentionally mixes:
|
||||
|
||||
- simple inflections
|
||||
- common derivational forms
|
||||
- US and UK spelling families
|
||||
- lexical forms appropriate for `US_UK_PROFI`
|
||||
|
||||
This design keeps runs reproducible across environments and avoids accidental drift caused by changing external corpora.
|
||||
|
||||
## Running benchmarks
|
||||
|
||||
Run the full benchmark suite:
|
||||
|
||||
```bash
|
||||
./gradlew jmh
|
||||
```
|
||||
|
||||
Run only the English comparison benchmark:
|
||||
|
||||
```bash
|
||||
./gradlew jmh -Pjmh.includes=EnglishStemmerComparisonBenchmark
|
||||
```
|
||||
|
||||
## Generated reports
|
||||
|
||||
JMH reports are written to:
|
||||
|
||||
- `build/reports/jmh/jmh-results.txt`
|
||||
- `build/reports/jmh/jmh-results.csv`
|
||||
|
||||
The text report is convenient for human review. The CSV report is more useful for CI archiving, historical tracking, and external processing.
|
||||
|
||||
## Interpreting results
|
||||
|
||||
Benchmark numbers should be read with care.
|
||||
|
||||
Important factors include:
|
||||
|
||||
- CPU model and frequency behavior
|
||||
- thermal throttling
|
||||
- JVM vendor and version
|
||||
- system background load
|
||||
- operating-system scheduling noise
|
||||
- benchmark parameter changes
|
||||
|
||||
For meaningful comparison, keep these stable:
|
||||
|
||||
- hardware or VM class
|
||||
- JDK version
|
||||
- benchmark parameters
|
||||
- thread count
|
||||
- benchmark source revision
|
||||
|
||||
If a regression is suspected, repeat the run and compare against the previous CSV output rather than relying on a single measurement.
|
||||
|
||||
## Regression tracking
|
||||
|
||||
The recommended regression workflow is:
|
||||
|
||||
1. archive `jmh-results.csv`
|
||||
2. compare the same benchmark names across runs
|
||||
3. compare only like-for-like environments
|
||||
4. investigate sustained regressions rather than one-off noise
|
||||
|
||||
For public reporting, the README should keep only the condensed benchmark summary, while detailed benchmark methodology and interpretation should remain in this document.
|
||||
|
||||
## Notes on comparison fairness
|
||||
|
||||
Radixor, Snowball Porter, and Snowball English are not the same kind of stemmer.
|
||||
|
||||
Radixor uses a compiled patch-command trie driven by dictionary data. Snowball Porter and Snowball English are rule-based English stemmers.
|
||||
|
||||
Because of that, the comparison should be understood as:
|
||||
|
||||
- equal input workload
|
||||
- different stemming strategies
|
||||
- measured throughput, not semantic identity
|
||||
|
||||
That distinction matters whenever performance claims are discussed in documentation or release notes.
|
||||
Reference in New Issue
Block a user