Refine stemmer core, compiled trie workflow, tests, and public documentation
feat: implement Compile CLI for building binary stemmer tables from source dictionaries feat: add loading support for persisted compiled tries, including GZip-compressed binaries feat: add a builder path for recreating a writable trie from a compiled trie feat: expose read-only value/count access for compiled trie entries feat: support deterministic NOOP patch encoding for identical source and target words fix: make value selection deterministic for equal frequencies using length and lexical tie-breakers fix: preserve valid alternative reductions during trie optimization and reduction fix: correct patch command edge cases discovered in round-trip and malformed-input tests fix: address persistence and compiled-trie handling defects found during implementation review fix: resolve test failures and behavioral regressions uncovered by PMD and JUnit runs refactor: reorganize trie-related support types into dedicated packages and classes refactor: simplify the core FrequencyTrie design toward a cleaner practical architecture refactor: improve compiled/read-only trie boundaries without restoring mutability refactor: clean up internal reduction, serialization, and helper structure test: add professional JUnit coverage for stemmer core classes test: split trie tests into dedicated test classes per production type test: improve parameterized tests for readability, diagnostics, and edge-case traceability test: cover positive, negative, malformed, persistence, and round-trip scenarios test: verify compiled dictionaries against source inputs using getAll semantics docs: write public README and supplementary Markdown documentation for project publishing docs: document architecture, reduction model, built-in languages, and operational guidance docs: clarify reverse-word storage, mutable construction, and compiled-trie runtime behavior docs: remove placeholders, vague buzzwords, and unexplained terminology from the documentation docs: improve examples and wording for professional reader-facing project guidance chore: align project materials with the practical Radix scope and Egothor/Stempel lineage chore: raise overall project quality through documentation review and test hardening
This commit is contained in:
148
docs/quick-start.md
Normal file
148
docs/quick-start.md
Normal file
@@ -0,0 +1,148 @@
|
||||
# Quick Start
|
||||
|
||||
> ← Back to [README.md](../README.md)
|
||||
|
||||
This guide shows the fastest way to start using **Radixor** and the most common next steps.
|
||||
|
||||
## Hello world
|
||||
|
||||
```java
|
||||
import java.io.IOException;
|
||||
|
||||
import org.egothor.stemmer.FrequencyTrie;
|
||||
import org.egothor.stemmer.PatchCommandEncoder;
|
||||
import org.egothor.stemmer.ReductionMode;
|
||||
import org.egothor.stemmer.StemmerPatchTrieLoader;
|
||||
|
||||
public final class HelloRadixor {
|
||||
|
||||
private HelloRadixor() {
|
||||
throw new AssertionError("No instances.");
|
||||
}
|
||||
|
||||
public static void main(final String[] arguments) throws IOException {
|
||||
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.load(
|
||||
StemmerPatchTrieLoader.Language.US_UK_PROFI,
|
||||
true,
|
||||
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
||||
|
||||
final String word = "running";
|
||||
final String patch = trie.get(word);
|
||||
final String stem = PatchCommandEncoder.apply(word, patch);
|
||||
|
||||
System.out.println(word + " -> " + stem);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This example shows the core workflow:
|
||||
|
||||
1. load a trie
|
||||
2. get a patch command for a word
|
||||
3. apply the patch
|
||||
4. obtain the stem
|
||||
|
||||
## Retrieve multiple candidate stems
|
||||
|
||||
If you need more than one candidate result, use `getAll(...)` instead of `get(...)`.
|
||||
|
||||
```java
|
||||
final String word = "axes";
|
||||
final String[] patches = trie.getAll(word);
|
||||
|
||||
for (String patch : patches) {
|
||||
final String stem = PatchCommandEncoder.apply(word, patch);
|
||||
System.out.println(word + " -> " + stem + " (" + patch + ")");
|
||||
}
|
||||
```
|
||||
|
||||
## Load a compiled binary stemmer
|
||||
|
||||
For production systems, the preferred approach is usually to precompile the dictionary and load the compressed binary artifact at runtime.
|
||||
|
||||
```java
|
||||
import java.io.IOException;
|
||||
import java.nio.file.Path;
|
||||
|
||||
import org.egothor.stemmer.FrequencyTrie;
|
||||
import org.egothor.stemmer.PatchCommandEncoder;
|
||||
import org.egothor.stemmer.StemmerPatchTrieLoader;
|
||||
|
||||
public final class BinaryStemmerExample {
|
||||
|
||||
private BinaryStemmerExample() {
|
||||
throw new AssertionError("No instances.");
|
||||
}
|
||||
|
||||
public static void main(final String[] arguments) throws IOException {
|
||||
final Path path = Path.of("stemmers", "english.radixor.gz");
|
||||
final FrequencyTrie<String> trie = StemmerPatchTrieLoader.loadBinary(path);
|
||||
|
||||
final String word = "connected";
|
||||
final String patch = trie.get(word);
|
||||
final String stem = PatchCommandEncoder.apply(word, patch);
|
||||
|
||||
System.out.println(word + " -> " + stem);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Compile a dictionary from the command line
|
||||
|
||||
```bash
|
||||
java org.egothor.stemmer.Compile \
|
||||
--input ./data/stemmer.txt \
|
||||
--output ./build/english.radixor.gz \
|
||||
--reduction-mode MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS \
|
||||
--store-original \
|
||||
--overwrite
|
||||
```
|
||||
|
||||
## Modify an existing compiled stemmer
|
||||
|
||||
```java
|
||||
import java.io.IOException;
|
||||
import java.nio.file.Path;
|
||||
|
||||
import org.egothor.stemmer.FrequencyTrie;
|
||||
import org.egothor.stemmer.FrequencyTrieBuilders;
|
||||
import org.egothor.stemmer.PatchCommandEncoder;
|
||||
import org.egothor.stemmer.ReductionMode;
|
||||
import org.egothor.stemmer.ReductionSettings;
|
||||
import org.egothor.stemmer.StemmerPatchTrieBinaryIO;
|
||||
|
||||
public final class ModifyCompiledExample {
|
||||
|
||||
private ModifyCompiledExample() {
|
||||
throw new AssertionError("No instances.");
|
||||
}
|
||||
|
||||
public static void main(final String[] arguments) throws IOException {
|
||||
final Path input = Path.of("stemmers", "english.radixor.gz");
|
||||
final Path output = Path.of("stemmers", "english-custom.radixor.gz");
|
||||
|
||||
final FrequencyTrie<String> compiledTrie = StemmerPatchTrieBinaryIO.read(input);
|
||||
|
||||
final ReductionSettings settings = ReductionSettings.withDefaults(
|
||||
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
||||
|
||||
final FrequencyTrie.Builder<String> builder = FrequencyTrieBuilders.copyOf(
|
||||
compiledTrie,
|
||||
String[]::new,
|
||||
settings);
|
||||
|
||||
builder.put("microservices", PatchCommandEncoder.NOOP_PATCH);
|
||||
|
||||
final FrequencyTrie<String> updatedTrie = builder.build();
|
||||
StemmerPatchTrieBinaryIO.write(updatedTrie, output);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Where to continue
|
||||
|
||||
* [Dictionary format](dictionary-format.md)
|
||||
* [CLI compilation](cli-compilation.md)
|
||||
* [Programmatic usage](programmatic-usage.md)
|
||||
* [Built-in languages](built-in-languages.md)
|
||||
* [Architecture and reduction](architecture-and-reduction.md)
|
||||
Reference in New Issue
Block a user