Files
Radixor/docs/quick-start.md
Leo Galambos 038514bad0 Refine stemmer core, compiled trie workflow, tests, and public documentation
feat: implement Compile CLI for building binary stemmer tables from source dictionaries
feat: add loading support for persisted compiled tries, including GZip-compressed binaries
feat: add a builder path for recreating a writable trie from a compiled trie
feat: expose read-only value/count access for compiled trie entries
feat: support deterministic NOOP patch encoding for identical source and target words

fix: make value selection deterministic for equal frequencies using length and lexical tie-breakers
fix: preserve valid alternative reductions during trie optimization and reduction
fix: correct patch command edge cases discovered in round-trip and malformed-input tests
fix: address persistence and compiled-trie handling defects found during implementation review
fix: resolve test failures and behavioral regressions uncovered by PMD and JUnit runs

refactor: reorganize trie-related support types into dedicated packages and classes
refactor: simplify the core FrequencyTrie design toward a cleaner practical architecture
refactor: improve compiled/read-only trie boundaries without restoring mutability
refactor: clean up internal reduction, serialization, and helper structure

test: add professional JUnit coverage for stemmer core classes
test: split trie tests into dedicated test classes per production type
test: improve parameterized tests for readability, diagnostics, and edge-case traceability
test: cover positive, negative, malformed, persistence, and round-trip scenarios
test: verify compiled dictionaries against source inputs using getAll semantics

docs: write public README and supplementary Markdown documentation for project publishing
docs: document architecture, reduction model, built-in languages, and operational guidance
docs: clarify reverse-word storage, mutable construction, and compiled-trie runtime behavior
docs: remove placeholders, vague buzzwords, and unexplained terminology from the documentation
docs: improve examples and wording for professional reader-facing project guidance

chore: align project materials with the practical Radix scope and Egothor/Stempel lineage
chore: raise overall project quality through documentation review and test hardening
2026-04-13 02:10:46 +02:00

4.3 KiB

Quick Start

← Back to README.md

This guide shows the fastest way to start using Radixor and the most common next steps.

Hello world

import java.io.IOException;

import org.egothor.stemmer.FrequencyTrie;
import org.egothor.stemmer.PatchCommandEncoder;
import org.egothor.stemmer.ReductionMode;
import org.egothor.stemmer.StemmerPatchTrieLoader;

public final class HelloRadixor {

    private HelloRadixor() {
        throw new AssertionError("No instances.");
    }

    public static void main(final String[] arguments) throws IOException {
        final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
                StemmerPatchTrieLoader.Language.US_UK_PROFI,
                true,
                ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);

        final String word = "running";
        final String patch = trie.get(word);
        final String stem = PatchCommandEncoder.apply(word, patch);

        System.out.println(word + " -> " + stem);
    }
}

This example shows the core workflow:

  1. load a trie
  2. get a patch command for a word
  3. apply the patch
  4. obtain the stem

Retrieve multiple candidate stems

If you need more than one candidate result, use getAll(...) instead of get(...).

final String word = "axes";
final String[] patches = trie.getAll(word);

for (String patch : patches) {
    final String stem = PatchCommandEncoder.apply(word, patch);
    System.out.println(word + " -> " + stem + " (" + patch + ")");
}

Load a compiled binary stemmer

For production systems, the preferred approach is usually to precompile the dictionary and load the compressed binary artifact at runtime.

import java.io.IOException;
import java.nio.file.Path;

import org.egothor.stemmer.FrequencyTrie;
import org.egothor.stemmer.PatchCommandEncoder;
import org.egothor.stemmer.StemmerPatchTrieLoader;

public final class BinaryStemmerExample {

    private BinaryStemmerExample() {
        throw new AssertionError("No instances.");
    }

    public static void main(final String[] arguments) throws IOException {
        final Path path = Path.of("stemmers", "english.radixor.gz");
        final FrequencyTrie<String> trie = StemmerPatchTrieLoader.loadBinary(path);

        final String word = "connected";
        final String patch = trie.get(word);
        final String stem = PatchCommandEncoder.apply(word, patch);

        System.out.println(word + " -> " + stem);
    }
}

Compile a dictionary from the command line

java org.egothor.stemmer.Compile \
    --input ./data/stemmer.txt \
    --output ./build/english.radixor.gz \
    --reduction-mode MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS \
    --store-original \
    --overwrite

Modify an existing compiled stemmer

import java.io.IOException;
import java.nio.file.Path;

import org.egothor.stemmer.FrequencyTrie;
import org.egothor.stemmer.FrequencyTrieBuilders;
import org.egothor.stemmer.PatchCommandEncoder;
import org.egothor.stemmer.ReductionMode;
import org.egothor.stemmer.ReductionSettings;
import org.egothor.stemmer.StemmerPatchTrieBinaryIO;

public final class ModifyCompiledExample {

    private ModifyCompiledExample() {
        throw new AssertionError("No instances.");
    }

    public static void main(final String[] arguments) throws IOException {
        final Path input = Path.of("stemmers", "english.radixor.gz");
        final Path output = Path.of("stemmers", "english-custom.radixor.gz");

        final FrequencyTrie<String> compiledTrie = StemmerPatchTrieBinaryIO.read(input);

        final ReductionSettings settings = ReductionSettings.withDefaults(
                ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);

        final FrequencyTrie.Builder<String> builder = FrequencyTrieBuilders.copyOf(
                compiledTrie,
                String[]::new,
                settings);

        builder.put("microservices", PatchCommandEncoder.NOOP_PATCH);

        final FrequencyTrie<String> updatedTrie = builder.build();
        StemmerPatchTrieBinaryIO.write(updatedTrie, output);
    }
}

Where to continue