Files
Radixor/docs/benchmarking.md
Leo Galambos 6b3559097a feat: add JMH comparison benchmarks for Radixor vs Snowball Porter stemmers
build: isolate Snowball benchmark integration into dedicated Gradle script
docs: highlight benchmarked throughput advantage in README
docs: add detailed benchmarking guide and execution notes
2026-04-14 18:25:41 +02:00

4.3 KiB

Benchmarking

← Back to README.md

Radixor includes a JMH benchmark suite for both the internal algorithmic core and a side-by-side English comparison against the Snowball Porter stemmer family.

This document explains what is benchmarked, how to run it, and how to interpret the results responsibly.

Scope

The benchmark suite currently covers two categories:

  • Radixor core operations
  • English stemmer comparison on the same token workload

The comparison benchmark processes the same deterministic English token stream through:

  • Radixor with bundled US_UK_PROFI
  • Snowball original Porter
  • Snowball English, commonly referred to as Porter2

The purpose of the comparison is throughput measurement on identical input. It is not intended to prove linguistic equivalence between the compared stemmers.

Current snapshot

A recent JMH run on JDK 21.0.10 with JMH 1.37, one thread, three warmup iterations, and five measurement iterations produced the following approximate throughput ranges:

Workload Radixor US_UK_PROFI Snowball Porter Snowball English
About 12,000 generated tokens 30.99 M tokens/s 8.21 M tokens/s 5.46 M tokens/s
About 60,000 generated tokens 32.25 M tokens/s 8.02 M tokens/s 5.11 M tokens/s

On that workload, Radixor is approximately:

  • 4 times faster than Snowball original Porter
  • 6 times faster than Snowball English

These values are workload- and environment-dependent. Treat them as measured results for the documented benchmark setup, not as universal constants.

Benchmark classes

The main benchmark classes are under src/jmh/java/org/egothor/stemmer/benchmark.

Relevant classes include:

  • FrequencyTrieLookupBenchmark
  • FrequencyTrieCompilationBenchmark
  • EnglishStemmerComparisonBenchmark

The English comparison benchmark uses the bundled Radixor English resource and the official Snowball Java distribution integrated into the JMH source set.

Workload design

The English comparison benchmark uses a deterministic generated corpus rather than an uncontrolled ad hoc text sample.

The workload intentionally mixes:

  • simple inflections
  • common derivational forms
  • US and UK spelling families
  • lexical forms appropriate for US_UK_PROFI

This design keeps runs reproducible across environments and avoids accidental drift caused by changing external corpora.

Running benchmarks

Run the full benchmark suite:

./gradlew jmh

Run only the English comparison benchmark:

./gradlew jmh -Pjmh.includes=EnglishStemmerComparisonBenchmark

Generated reports

JMH reports are written to:

  • build/reports/jmh/jmh-results.txt
  • build/reports/jmh/jmh-results.csv

The text report is convenient for human review. The CSV report is more useful for CI archiving, historical tracking, and external processing.

Interpreting results

Benchmark numbers should be read with care.

Important factors include:

  • CPU model and frequency behavior
  • thermal throttling
  • JVM vendor and version
  • system background load
  • operating-system scheduling noise
  • benchmark parameter changes

For meaningful comparison, keep these stable:

  • hardware or VM class
  • JDK version
  • benchmark parameters
  • thread count
  • benchmark source revision

If a regression is suspected, repeat the run and compare against the previous CSV output rather than relying on a single measurement.

Regression tracking

The recommended regression workflow is:

  1. archive jmh-results.csv
  2. compare the same benchmark names across runs
  3. compare only like-for-like environments
  4. investigate sustained regressions rather than one-off noise

For public reporting, the README should keep only the condensed benchmark summary, while detailed benchmark methodology and interpretation should remain in this document.

Notes on comparison fairness

Radixor, Snowball Porter, and Snowball English are not the same kind of stemmer.

Radixor uses a compiled patch-command trie driven by dictionary data. Snowball Porter and Snowball English are rule-based English stemmers.

Because of that, the comparison should be understood as:

  • equal input workload
  • different stemming strategies
  • measured throughput, not semantic identity

That distinction matters whenever performance claims are discussed in documentation or release notes.