Files

feat: add JMH comparison benchmarks for Radixor vs Snowball Porter stemmers

build: isolate Snowball benchmark integration into dedicated Gradle script
docs: highlight benchmarked throughput advantage in README
docs: add detailed benchmarking guide and execution notes

2026-04-14 18:25:41 +02:00

4.3 KiB

Raw Blame History

Benchmarking

← Back to README.md

Radixor includes a JMH benchmark suite for both the internal algorithmic core and a side-by-side English comparison against the Snowball Porter stemmer family.

This document explains what is benchmarked, how to run it, and how to interpret the results responsibly.

Scope

The benchmark suite currently covers two categories:

Radixor core operations
English stemmer comparison on the same token workload

The comparison benchmark processes the same deterministic English token stream through:

Radixor with bundled US_UK_PROFI
Snowball original Porter
Snowball English, commonly referred to as Porter2

The purpose of the comparison is throughput measurement on identical input. It is not intended to prove linguistic equivalence between the compared stemmers.

Current snapshot

A recent JMH run on JDK 21.0.10 with JMH 1.37, one thread, three warmup iterations, and five measurement iterations produced the following approximate throughput ranges:

Workload	Radixor `US_UK_PROFI`	Snowball Porter	Snowball English
About 12,000 generated tokens	30.99 M tokens/s	8.21 M tokens/s	5.46 M tokens/s
About 60,000 generated tokens	32.25 M tokens/s	8.02 M tokens/s	5.11 M tokens/s

On that workload, Radixor is approximately:

4 times faster than Snowball original Porter
6 times faster than Snowball English

These values are workload- and environment-dependent. Treat them as measured results for the documented benchmark setup, not as universal constants.

Benchmark classes

The main benchmark classes are under src/jmh/java/org/egothor/stemmer/benchmark.

Relevant classes include:

FrequencyTrieLookupBenchmark
FrequencyTrieCompilationBenchmark
EnglishStemmerComparisonBenchmark

The English comparison benchmark uses the bundled Radixor English resource and the official Snowball Java distribution integrated into the JMH source set.

Workload design

The English comparison benchmark uses a deterministic generated corpus rather than an uncontrolled ad hoc text sample.

The workload intentionally mixes:

simple inflections
common derivational forms
US and UK spelling families
lexical forms appropriate for US_UK_PROFI

This design keeps runs reproducible across environments and avoids accidental drift caused by changing external corpora.

Running benchmarks

Run the full benchmark suite:

./gradlew jmh

Run only the English comparison benchmark:

./gradlew jmh -Pjmh.includes=EnglishStemmerComparisonBenchmark

Generated reports

JMH reports are written to:

build/reports/jmh/jmh-results.txt
build/reports/jmh/jmh-results.csv

The text report is convenient for human review. The CSV report is more useful for CI archiving, historical tracking, and external processing.

Interpreting results

Benchmark numbers should be read with care.

Important factors include:

CPU model and frequency behavior
thermal throttling
JVM vendor and version
system background load
operating-system scheduling noise
benchmark parameter changes

For meaningful comparison, keep these stable:

hardware or VM class
JDK version
benchmark parameters
thread count
benchmark source revision

If a regression is suspected, repeat the run and compare against the previous CSV output rather than relying on a single measurement.

Regression tracking

The recommended regression workflow is:

archive jmh-results.csv
compare the same benchmark names across runs
compare only like-for-like environments
investigate sustained regressions rather than one-off noise

For public reporting, the README should keep only the condensed benchmark summary, while detailed benchmark methodology and interpretation should remain in this document.

Notes on comparison fairness

Radixor, Snowball Porter, and Snowball English are not the same kind of stemmer.

Radixor uses a compiled patch-command trie driven by dictionary data. Snowball Porter and Snowball English are rule-based English stemmers.

Because of that, the comparison should be understood as:

equal input workload
different stemming strategies
measured throughput, not semantic identity

That distinction matters whenever performance claims are discussed in documentation or release notes.

4.3 KiB Raw Blame History