feat: implement dense-child optimized trie lookup and enterprise test/CI profile hardening
This commit is contained in:
193
docs/lookup-edge-optimization.md
Normal file
193
docs/lookup-edge-optimization.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# Lookup Edge Optimization
|
||||
|
||||
Compiled trie nodes (`CompiledNode`) use three lookup strategies when resolving child edges:
|
||||
|
||||
1. dense array direct lookup,
|
||||
2. linear scan for very small child counts,
|
||||
3. binary search over sorted edge labels.
|
||||
|
||||
This page explains the dense path, what `maxExpandedIndex` controls, and how to tune it.
|
||||
|
||||
## Runtime model of one node
|
||||
|
||||
For a node with sorted edge labels `char[] edges`, the implementation can materialize an
|
||||
index-aligned dense table when labels occupy a small compact code-point interval:
|
||||
|
||||
```text
|
||||
span = maxEdge - minEdge
|
||||
use dense table iff (span <= maxExpandedIndex) and (maxExpandedIndex > 0)
|
||||
```
|
||||
|
||||
When dense lookup is used, lookup is constant-time indexing:
|
||||
|
||||
```text
|
||||
denseIndex = requestedEdge - minEdge
|
||||
return denseChildren[denseIndex] // or null if outside interval
|
||||
```
|
||||
|
||||
When dense lookup is not active (interval is too wide or the configured
|
||||
`maxExpandedIndex` is `0`), `CompiledNode` still chooses between two fallback
|
||||
strategies:
|
||||
|
||||
- **linear scan** for very small child counts (`4` or fewer children),
|
||||
- **binary search** for larger child counts.
|
||||
|
||||
This means the fallback method is selected by child count, not by “distance” alone.
|
||||
`linear scan` is therefore used when there are only a few edges even if those edges are
|
||||
spread across very distant code points.
|
||||
|
||||
### Example: few edges, wide Unicode span
|
||||
|
||||
```text
|
||||
edges = ['a', '中', '你']
|
||||
edge count = 3
|
||||
minEdge = 'a' (U+0061)
|
||||
maxEdge = '你' (U+4F60)
|
||||
span = 20319
|
||||
```
|
||||
|
||||
- If `maxExpandedIndex = 512`, dense indexing is not used because `span > maxExpandedIndex`.
|
||||
- Because `edge count = 3` (<= 4), lookup falls back to a tiny linear scan of the
|
||||
three labels.
|
||||
- This is exactly the case where you get benefit from the threshold even though the interval is wide.
|
||||
|
||||
This is useful for non-Latin scripts as well: what matters is interval width in Unicode
|
||||
code points, not script name. A compact Arabic-range block can still benefit from dense
|
||||
lookups when keys stay in a tight code-point interval.
|
||||
|
||||
## Why this is configurable
|
||||
|
||||
`maxExpandedIndex` is only a performance/paging choice:
|
||||
|
||||
- higher value:
|
||||
- more compact intervals qualify for dense tables,
|
||||
- more constant-time child lookup,
|
||||
- more memory for dense tables in qualifying nodes.
|
||||
- lower value (or `0`):
|
||||
- less dense-table allocation,
|
||||
- fewer branches into constant-time path,
|
||||
- lower materialization memory.
|
||||
|
||||
The value never changes lookup semantics. It only changes the in-memory structure shape.
|
||||
|
||||
## Persistence and loading model
|
||||
|
||||
This threshold is **not** stored in `TrieMetadata`.
|
||||
|
||||
- The binary format stores only trie payload and semantic metadata (`reduction`, `traversal`,
|
||||
case/diacritic settings, and stream version).
|
||||
- `maxExpandedIndex` is chosen when materializing nodes in memory.
|
||||
- You can therefore keep one persisted artifact and load it with different in-memory
|
||||
trade-offs depending on deployment constraints.
|
||||
|
||||
## Default
|
||||
|
||||
- `FrequencyTrie.DEFAULT_MAX_EXPANDED_INDEX == 512`
|
||||
- `CompiledNode.DEFAULT_MAX_EXPANDED_INDEX == 512`
|
||||
|
||||
These are practical defaults for mixed-language text and Latin-like scripts where edge labels
|
||||
often cluster.
|
||||
|
||||
## Tune during build (writable phase)
|
||||
|
||||
Use the full `FrequencyTrie.Builder` constructor when you are compiling from source data.
|
||||
The builder threshold is applied while freezing reduced nodes into the immutable form.
|
||||
|
||||
```java
|
||||
import org.egothor.stemmer.CaseProcessingMode;
|
||||
import org.egothor.stemmer.DiacriticProcessingMode;
|
||||
import org.egothor.stemmer.FrequencyTrie;
|
||||
import org.egothor.stemmer.ReductionMode;
|
||||
import org.egothor.stemmer.ReductionSettings;
|
||||
import org.egothor.stemmer.WordTraversalDirection;
|
||||
|
||||
final ReductionSettings settings = ReductionSettings.withDefaults(
|
||||
ReductionMode.MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS);
|
||||
|
||||
final FrequencyTrie.Builder<String> fastBuilder =
|
||||
new FrequencyTrie.Builder<>(String[]::new,
|
||||
settings,
|
||||
WordTraversalDirection.BACKWARD,
|
||||
CaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOT,
|
||||
DiacriticProcessingMode.AS_IS,
|
||||
1024); // prefer lookup speed
|
||||
|
||||
// ... put(...) ...
|
||||
final FrequencyTrie<String> trie = fastBuilder.build();
|
||||
```
|
||||
|
||||
Use `0` or `256` for lower memory while still building larger tries.
|
||||
|
||||
```java
|
||||
final FrequencyTrie.Builder<String> compactBuilder =
|
||||
new FrequencyTrie.Builder<>(String[]::new,
|
||||
settings,
|
||||
WordTraversalDirection.BACKWARD,
|
||||
CaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOT,
|
||||
DiacriticProcessingMode.AS_IS,
|
||||
256); // lower memory profile
|
||||
```
|
||||
|
||||
## Tune when loading a binary artifact (runtime phase)
|
||||
|
||||
At artifact load time, you can tune the same trade-off independently of persisted metadata.
|
||||
|
||||
```java
|
||||
import java.nio.file.Path;
|
||||
|
||||
import org.egothor.stemmer.StemmerPatchTrieLoader;
|
||||
|
||||
var defaultLookup = StemmerPatchTrieLoader.loadBinary(
|
||||
Path.of("stemmers", "english.radixor.gz"));
|
||||
|
||||
var fastLookup = StemmerPatchTrieLoader.loadBinary(
|
||||
Path.of("stemmers", "english.radixor.gz"), 1024);
|
||||
|
||||
var compactLookup = StemmerPatchTrieLoader.loadBinary(
|
||||
Path.of("stemmers", "english.radixor.gz"), 0);
|
||||
```
|
||||
|
||||
You can also set the threshold directly with `FrequencyTrie.readFrom(...)` when reading streams:
|
||||
|
||||
```java
|
||||
import java.io.DataInputStream;
|
||||
import java.io.IOException;
|
||||
import java.io.InputStream;
|
||||
import java.nio.file.Files;
|
||||
import java.nio.file.Path;
|
||||
import java.util.zip.GZIPInputStream;
|
||||
|
||||
import org.egothor.stemmer.FrequencyTrie;
|
||||
|
||||
public final class StreamLoadExample {
|
||||
|
||||
private StreamLoadExample() {
|
||||
throw new AssertionError("No instances.");
|
||||
}
|
||||
|
||||
public static void main(final String[] arguments) throws IOException {
|
||||
try (InputStream fileInput = Files.newInputStream(Path.of("stemmers", "english.radixor.gz"));
|
||||
GZIPInputStream gzip = new GZIPInputStream(fileInput);
|
||||
DataInputStream dataInput = new DataInputStream(gzip)) {
|
||||
final FrequencyTrie<String> compactOnLoad = FrequencyTrie.readFrom(
|
||||
dataInput,
|
||||
String[]::new,
|
||||
input -> input.readUTF(),
|
||||
256);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note: the string codec is intentionally inline in this snippet to keep it self-contained.
|
||||
|
||||
## Practical guidance
|
||||
|
||||
- Start with default (`512`) in production and profile before changing it.
|
||||
- Use `0` when memory is the priority and query throughput is not the bottleneck.
|
||||
- Use values around `1024` for workloads dominated by compact alphabets and very hot lookups.
|
||||
|
||||
Trade-off expectation:
|
||||
|
||||
- increasing `maxExpandedIndex` improves lookup speed when edges tend to occupy short spans,
|
||||
- decreasing it reduces per-node auxiliary memory in dense-span nodes.
|
||||
@@ -87,6 +87,43 @@ public final class LoadBinaryExample {
|
||||
|
||||
The binary format is the native `FrequencyTrie` serialization wrapped in GZip compression. It includes persisted `TrieMetadata`, so lookup after loading uses the traversal, case-processing, diacritic-processing, and reduction settings captured when the trie was compiled.
|
||||
|
||||
## Tune child lookup density when loading binaries
|
||||
|
||||
To optimize hot-path latency, you can tune direct child indexing by passing `maxExpandedIndex`
|
||||
at load time. This does not change persisted metadata, only the materialized in-memory form.
|
||||
|
||||
```java
|
||||
import java.io.IOException;
|
||||
import java.nio.file.Path;
|
||||
|
||||
import org.egothor.stemmer.FrequencyTrie;
|
||||
import org.egothor.stemmer.StemmerPatchTrieLoader;
|
||||
|
||||
public final class LoadBinaryWithDenseLookupExample {
|
||||
|
||||
private LoadBinaryWithDenseLookupExample() {
|
||||
throw new AssertionError("No instances.");
|
||||
}
|
||||
|
||||
public static void main(final String[] arguments) throws IOException {
|
||||
final FrequencyTrie<String> balanced = StemmerPatchTrieLoader.loadBinary(
|
||||
Path.of("stemmers", "english.radixor.gz"));
|
||||
|
||||
final FrequencyTrie<String> fast = StemmerPatchTrieLoader.loadBinary(
|
||||
Path.of("stemmers", "english.radixor.gz"),
|
||||
1024);
|
||||
|
||||
final FrequencyTrie<String> compact = StemmerPatchTrieLoader.loadBinary(
|
||||
Path.of("stemmers", "english.radixor.gz"),
|
||||
0);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Negative values still use `FrequencyTrie.DEFAULT_MAX_EXPANDED_INDEX`.
|
||||
|
||||
[Lookup Edge Optimization](lookup-edge-optimization.md) describes the trade-off in detail and examples for build-time tuning as well.
|
||||
|
||||
## Build directly with a mutable builder
|
||||
|
||||
A `FrequencyTrie.Builder<V>` accepts repeated `put(key, value)` calls and compiles the final read-only trie through `build()`. Compilation performs bottom-up reduction and produces the compact immutable runtime representation.
|
||||
|
||||
@@ -25,6 +25,7 @@ This is why Radixor can generalize beyond explicitly listed forms and why compil
|
||||
The programmatic API is easier to understand when split by developer task:
|
||||
|
||||
- [Loading and Building Stemmers](programmatic-loading-and-building.md) explains how to acquire a compiled stemmer from bundled resources, textual dictionaries, binary artifacts, or direct builder usage.
|
||||
- [Lookup Edge Optimization](lookup-edge-optimization.md) explains dense child lookup tuning and the speed/memory trade-off when materializing compiled tries.
|
||||
- [Querying and Ambiguity Handling](programmatic-querying-and-ambiguity.md) explains `get(...)`, `getAll(...)`, `getEntries(...)`, patch application, and the practical meaning of reduction modes.
|
||||
- [Extending and Persisting Compiled Tries](programmatic-extending-and-persistence.md) explains how to reopen compiled tries, add new lexical data, rebuild them, and store them as binary artifacts.
|
||||
|
||||
|
||||
@@ -58,6 +58,27 @@ A deterministic system is easier to test, easier to reason about, and safer to i
|
||||
|
||||
The project is intended to maintain very high confidence in both core correctness and behavioral stability.
|
||||
|
||||
The recommended execution strategy is defined by the tagged test profiles in [Test taxonomy and execution filtering](test-taxonomy-and-filtering.md). In practice, teams can execute profile tasks directly:
|
||||
|
||||
- `./gradlew ciSmoke`: fast local/PR safety checks (`unit`, excluding `slow`; additionally excludes
|
||||
`CompileIntegrationTest` as a defensive safeguard).
|
||||
- `./gradlew ciSlow`: enterprise heavy gate for all tests marked with `slow` (typically
|
||||
production dictionary and large corpus verification). This should be used for scheduled/manual
|
||||
hardening gates and not in standard release build.
|
||||
- `./gradlew ciCore`: behavioral coverage of trie and frequency-trie paths (`unit` + `property` where applicable)
|
||||
- `./gradlew ciIntegration`: pipeline and CLI integration path checks
|
||||
- `./gradlew ciCompat`: compatibility and regression verification for persisted artifacts
|
||||
- `./gradlew ciRelease`: full non-slow suite for release-confidence runs (all test tags except `slow`,
|
||||
plus explicit name-based exclusion of `CompileIntegrationTest*` and
|
||||
`StemmerPatchTrieLoaderTest$BundledDictionaryTests*` as additional guardrails)
|
||||
- `./gradlew ciNightly`: extended fuzz profile for robustness hardening
|
||||
- `./gradlew ci`: umbrella profile depending on smoke/core/integration/compat
|
||||
|
||||
## Test taxonomy and execution filtering
|
||||
|
||||
The full tag taxonomy and executable filter examples are documented in
|
||||
[Test taxonomy and execution filtering](test-taxonomy-and-filtering.md).
|
||||
|
||||
### Structural coverage
|
||||
|
||||
High code coverage is treated as a useful signal, but not as a sufficient goal on its own. Coverage is valuable only when the covered scenarios actually pressure the implementation in meaningful ways.
|
||||
|
||||
@@ -67,6 +67,36 @@ public final class LoadBinaryStemmerExample {
|
||||
}
|
||||
```
|
||||
|
||||
You can tune in-memory child lookup density at load time without changing the artifact:
|
||||
|
||||
```java
|
||||
import java.io.IOException;
|
||||
import java.nio.file.Path;
|
||||
|
||||
import org.egothor.stemmer.FrequencyTrie;
|
||||
import org.egothor.stemmer.StemmerPatchTrieLoader;
|
||||
|
||||
public final class LoadBinaryStemmerExampleTuned {
|
||||
|
||||
private LoadBinaryStemmerExampleTuned() {
|
||||
throw new AssertionError("No instances.");
|
||||
}
|
||||
|
||||
public static void main(final String[] arguments) throws IOException {
|
||||
final FrequencyTrie<String> fast = StemmerPatchTrieLoader.loadBinary(
|
||||
Path.of("stemmers", "english.radixor.gz"),
|
||||
1024);
|
||||
final FrequencyTrie<String> compact = StemmerPatchTrieLoader.loadBinary(
|
||||
Path.of("stemmers", "english.radixor.gz"),
|
||||
128);
|
||||
|
||||
System.out.println("fast=" + fast.size() + ", compact=" + compact.size());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For the trade-off details, see [Lookup Edge Optimization](lookup-edge-optimization.md).
|
||||
|
||||
### Build or extend a stemmer from dictionary data
|
||||
|
||||
Radixor can also build a compiled trie from a custom dictionary. Dictionary lines consist of a canonical stem followed by zero or more variants. The input may be plain UTF-8 text or GZip-compressed UTF-8 text when loaded from a filesystem path. The parser applies `CaseProcessingMode` (default: `LOWERCASE_WITH_LOCALE_ROOT`), ignores leading and trailing whitespace around columns, supports line remarks introduced by `#` or `//`, and skips dictionary items that contain embedded whitespace.
|
||||
|
||||
@@ -23,7 +23,7 @@ These reports are primarily useful when reviewing the published API surface and
|
||||
|
||||
These reports describe the outcome of core verification and static-analysis stages for the latest published build:
|
||||
|
||||
- [Unit test report](https://leogalambos.github.io/Radixor/builds/latest/test/)
|
||||
- [Release verification test report (ciRelease)](https://leogalambos.github.io/Radixor/builds/latest/test/)
|
||||
- [PMD report](https://leogalambos.github.io/Radixor/builds/latest/pmd/main.html)
|
||||
- [JaCoCo coverage report](https://leogalambos.github.io/Radixor/builds/latest/coverage/)
|
||||
- [PIT mutation testing report](https://leogalambos.github.io/Radixor/builds/latest/pitest/)
|
||||
|
||||
216
docs/test-taxonomy-and-filtering.md
Normal file
216
docs/test-taxonomy-and-filtering.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# Test Tag Taxonomy and Execution Guide
|
||||
|
||||
Radixor uses JUnit tags as an explicit execution policy for its test suite.
|
||||
|
||||
The project uses three orthogonal axes:
|
||||
|
||||
1. **Scope** (how the test is executed in the pipeline)
|
||||
2. **Domain** (where in the system it belongs)
|
||||
3. **Intent** (what behavior it verifies)
|
||||
|
||||
## Canonical scope tags
|
||||
|
||||
| Tag | Description | Typical usage |
|
||||
| --- | --- | --- |
|
||||
| `unit` | Fast, deterministic tests that exercise a specific class or behavior without external processes. | Default developer feedback; should stay near-zero flakiness and low run time. |
|
||||
| `integration` | Tests that span multiple components or end-to-end flows of the public pipeline. | Parser/loader/CLI/IO integration checks and multi-step compile-then-load validations. |
|
||||
| `property` | Property-based tests with generator-driven coverage for invariants. | Semantics-preserving laws and edge-case exploration beyond curated fixtures. |
|
||||
| `fuzz` | Randomized stress checks with bounded runtime. | Heavier probabilistic verification of robustness and reduction invariants. |
|
||||
| `compat` | Backward/forward compatibility and reproducibility checks for persisted artifacts. | Artifact fingerprints, deterministic rebuild, and regression fixtures. |
|
||||
| `slow` | Long-running or expensive tests that should not execute in every fast gate. | Heavy fuzz/property budgets or high-duration integration checks. |
|
||||
|
||||
## Canonical domain tags
|
||||
|
||||
| Tag | Description | Typical usage |
|
||||
| --- | --- | --- |
|
||||
| `core` | Core algorithm and foundational platform behavior. | Traversal direction, base data structures, low-level helpers. |
|
||||
| `trie` | All mutable/compiled trie behaviors and traversal internals. | Lookup path selection, node shape, child representation, subtree behavior. |
|
||||
| `frequency-trie` | Algorithms and corner cases specific to frequency-aware trie logic. | Ranking, weighted reductions, persistence of weighted nodes. |
|
||||
| `stemmer` | End-user stemming pipeline semantics. | Parse-encode-apply flows and output invariants. |
|
||||
| `patch` | Patch encoding, decoding, and application semantics. | `PatchCommandEncoder` behavior and related compatibility contracts. |
|
||||
| `io` | Input/output and resource loading boundaries. | Filesystem readers, streams, and stream lifecycle handling. |
|
||||
| `serialization` | Binary persistence contract of compiled artifacts. | Versioned format reads/writes and checksum/consistency checks. |
|
||||
| `parser` | Dictionary and metadata parsing concerns. | Dictionary input parsing and malformed-source rejection. |
|
||||
| `cli` | Command-line entrypoint and command orchestration behavior. | Compile CLI integration and CLI argument validation. |
|
||||
| `metadata` | Trie metadata semantics, compatibility fields, and schema expectations. | Version flags, structural properties, and metadata round-trips. |
|
||||
| `compile` | Compile-time pipeline and build-oriented behavior. | Building, reduction-mode behavior, and compiled artifact generation. |
|
||||
| `diacritic` | Unicode diacritic normalization and stripping behavior. | Accent-removal correctness and locale-safe normalization checks. |
|
||||
|
||||
## Canonical intent tags
|
||||
|
||||
| Tag | Description | Typical usage |
|
||||
| --- | --- | --- |
|
||||
| `construction` | Tests around construction and assembly of runtime structures. | Builders, loaders, and compile-time object construction contracts. |
|
||||
| `lookup` | Read behavior and retrieval semantics. | `get()`, `getAll()`, traversal and missing-key behavior. |
|
||||
| `persistence` | Storage lifecycle semantics. | Serialization/deserialization and round-trip correctness. |
|
||||
| `reduction` | Reduction algorithm correctness and corner cases. | Dominance threshold, subtree deduplication, rank-preservation invariants. |
|
||||
| `encoding` | Encoding transformation direction. | `PatchCommandEncoder.encode` and serialized command form generation. |
|
||||
| `decoding` | Decoding/interpretation of persisted or runtime commands. | Optional consumers that parse and apply encoded command payloads. |
|
||||
| `apply` | Patch application and transformation behavior. | Verifies that applied patches produce expected derived forms. |
|
||||
| `normalization` | Canonicalization and cleanup behavior. | String normalization around case/shape and mirrored input paths. |
|
||||
| `validation` | Input rejection and defensive checks. | Null/empty/invalid contracts and explicit failure conditions. |
|
||||
| `regression` | Guard tests for behavior changes over time. | Known historical bugs and behavioral drift prevention. |
|
||||
| `determinism` | Repeatable results under fixed input and settings. | Compile determinism, stable ordering, and artifact reproducibility. |
|
||||
| `error-handling` | Exception surface and robustness expectations. | Recovery/failure modes and diagnostics quality. |
|
||||
|
||||
## Class-level rules
|
||||
|
||||
1. Every test class has **exactly one** scope tag.
|
||||
2. Every test class has at least one domain tag.
|
||||
3. Additional tags describe intent and may be used on classes or nested tests.
|
||||
4. For each test class, intent tags should reflect the primary behavior under test, not historical naming conventions.
|
||||
|
||||
## Governance and execution policy
|
||||
|
||||
The following rules are used to keep the suite auditable and stable:
|
||||
|
||||
| Rule | Required state | Why |
|
||||
| --- | --- | --- |
|
||||
| Scope discipline | Exactly one scope tag per class. | Prevents accidental promotion of integration-only behavior into fast unit runs. |
|
||||
| Coverage breadth | At least one domain tag per class. | Ensures tests can be grouped by subsystem for targeted review. |
|
||||
| Intent specificity | Use at least one intent tag when behavior is non-trivial. | Makes failure triage faster and profile composition explicit. |
|
||||
| Runtime policy | Never run `slow` tests in the default `unit` profile unless explicitly required. | Preserves turnaround for PR feedback while preserving deep checks. |
|
||||
| Change risk | Any persistence or compatibility-affecting change must include `compat` in validation. | Protects long-lived binary artifact contracts. |
|
||||
| Mutation resistance | `fuzz`/`property` sets should be gated to dedicated profiles. | Limits flakiness exposure and controls CI resource cost. |
|
||||
|
||||
## Suggested CI profiles
|
||||
|
||||
These are recommended launch profiles for local and CI usage and are also exposed as Gradle tasks:
|
||||
|
||||
- **Profile: `ci-smoke` (fast feedback):**
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=unit -DexcludeTags=slow
|
||||
./gradlew ciSmoke
|
||||
```
|
||||
|
||||
`ciSmoke` also excludes `org.egothor.stemmer.CompileIntegrationTest*` at test-name filter level as a
|
||||
defensive fallback in case of future tag drift.
|
||||
`ciRelease` also excludes
|
||||
`org.egothor.stemmer.StemmerPatchTrieLoaderTest$BundledDictionaryTests*` at filter level.
|
||||
|
||||
- **Profile: `ci-core` (core behavioral coverage):**
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=unit,trie,frequency-trie,property
|
||||
./gradlew ciCore
|
||||
```
|
||||
|
||||
- **Profile: `ci-integration` (pipeline correctness):**
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=integration
|
||||
./gradlew ciIntegration
|
||||
```
|
||||
|
||||
- **Profile: `ci-slow` (explicit heavy validation):**
|
||||
|
||||
```
|
||||
./gradlew ciSlow
|
||||
```
|
||||
|
||||
- **Profile: `ci-compat` (artifact stability):**
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=compat,regression
|
||||
./gradlew ciCompat
|
||||
```
|
||||
|
||||
- **Profile: `ci-release` (strong confidence before release):**
|
||||
|
||||
```
|
||||
./gradlew test -DexcludeTags=slow
|
||||
./gradlew ciRelease
|
||||
```
|
||||
`ciRelease` is non-slow by policy and uses the same defensive name-based exclusion for
|
||||
`org.egothor.stemmer.CompileIntegrationTest*` and
|
||||
`org.egothor.stemmer.StemmerPatchTrieLoaderTest$BundledDictionaryTests*` in addition to tag filtering.
|
||||
|
||||
- **Profile: `ci-nightly` (extended hardening):**
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=fuzz
|
||||
./gradlew ciNightly
|
||||
```
|
||||
|
||||
- **Profile: `ci` (enterprise umbrella):**
|
||||
|
||||
```
|
||||
./gradlew ci
|
||||
```
|
||||
|
||||
`ci` and `ciRelease` intentionally do **not** include `slow` paths. Run `ciSlow` explicitly for production-dictionary stress and long-running corpus checks.
|
||||
|
||||
## Practical examples
|
||||
|
||||
All examples use Gradle with JUnit Platform integration:
|
||||
|
||||
- Only unit tests:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=unit
|
||||
```
|
||||
|
||||
- Integration tests only:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=integration
|
||||
```
|
||||
|
||||
- Only trie subsystem tests:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=trie
|
||||
```
|
||||
|
||||
- Deterministic fuzz checks:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=fuzz
|
||||
```
|
||||
|
||||
- Property tests:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=property
|
||||
```
|
||||
|
||||
- Stemmer + patch command behavior:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=stemmer,patch
|
||||
```
|
||||
|
||||
- Compatibility artifacts and regression checks:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=compat
|
||||
```
|
||||
|
||||
- Keep regression suite and remove long-running cases:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=regression -DexcludeTags=slow
|
||||
```
|
||||
|
||||
- Core + patch behavior:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=trie,patch
|
||||
```
|
||||
|
||||
- Deterministic compatibility and persistence checks:
|
||||
|
||||
```
|
||||
./gradlew test -DincludeTags=compat,determinism,serialization
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- `-DincludeTags` and `-DexcludeTags` are interpreted by Gradle task filtering and forwarded into
|
||||
JUnit tag filtering.
|
||||
- Class-name filtering is also available via Gradle test selectors where needed
|
||||
(for example, `--tests *CompileTest`), but tag filtering remains the default
|
||||
execution strategy.
|
||||
- `-DincludeTags` supports comma-separated literal tags. When you need a single exact tag with special
|
||||
characters, quote the argument for the shell.
|
||||
Reference in New Issue
Block a user