From 344d24dec97e2d22e18fa2a073614761b4b9eef4 Mon Sep 17 00:00:00 2001 From: Leo Galambos Date: Mon, 9 Mar 2026 22:21:22 +0100 Subject: [PATCH] docs: expand README with full CLI argument documentation and AI usage example --- README.md | 338 ++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 301 insertions(+), 37 deletions(-) diff --git a/README.md b/README.md index 5b33890..b69e15c 100644 --- a/README.md +++ b/README.md @@ -1,44 +1,74 @@ # MethodAtlasApp - +MethodAtlas logo -`MethodAtlasApp` is a small standalone CLI that scans Java source trees for JUnit test methods and prints per-method -statistics: +MethodAtlas is a small standalone CLI that scans Java source trees for JUnit 5 test methods and emits one record per discovered method. -- **FQCN** (fully-qualified class name) -- **method name** -- **LOC** (lines of code, based on the AST source range) -- **@Tag values** attached to the method (supports repeated `@Tag` and `@Tags({...})`) +It combines source-derived metadata with optional AI-assisted security classification so that a programmer can quickly understand what a test suite contains, which tests appear security-relevant, and which methods may benefit from consistent `@Tag` and `@DisplayName` annotations. -It supports two output modes: +For each discovered test method, MethodAtlas reports: +- `fqcn` Fully qualified class name +- `method` Test method name +- `loc` Inclusive lines of code for the method declaration +- `tags` Existing JUnit `@Tag` values declared on the method -- **CSV** (default) -- **Plain text** (`-plain` as the first CLI argument) +When AI enrichment is enabled, it also reports: +- `ai_security_relevant` Whether the model classified the test as security-relevant +- `ai_display_name` Suggested security-oriented display name +- `ai_tags` Suggested security taxonomy tags +- `ai_reason` Short rationale for the classification -## Build & run +Method discovery is AST-based via JavaParser rather than regex-based parsing. The CLI scans files ending in `*Test.java`, recognizes JUnit Jupiter methods annotated with `@Test`, `@ParameterizedTest`, or `@RepeatedTest`, and extracts tags from both repeated `@Tag` usage and `@Tags({...})`. -Assuming you have a runnable JAR (e.g. `methodatlas.jar`): +## Distribution layout + +After building and packaging, the distribution archive has this structure: + +```text +methodatlas-/ +├── bin/ +│ ├── methodatlas +│ └── methodatlas.bat +└── lib/ + └── methodatlas-.jar +``` + +Run the CLI from the `bin` directory, for example: ```bash -java -jar methodatlas.jar [ -plain ] [ ...] -```` +cd methodatlas-/bin +./methodatlas /path/to/project +``` -* If **no paths** are provided, the current directory (i.e., `.`) is scanned. -* Multiple root paths are supported. +## Usage + +```bash +./methodatlas [options] [path1] [path2] ... +``` + +If no scan path is provided, the current directory is scanned. Multiple root paths are supported. ## Output modes -### CSV (default) +### CSV mode (default) -* Prints a **header line** -* Each record contains **values only** -* Tags are **semicolon-separated** in the `tags` column (empty if no tags) +CSV mode prints a header followed by one record per discovered test method. + +Without AI: + +```text +fqcn,method,loc,tags +``` + +With AI: + +```text +fqcn,method,loc,tags,ai_security_relevant,ai_display_name,ai_tags,ai_reason +``` Example: ```text -Feb 11, 2026 1:33:35 AM org.egothor.methodatlas.MethodAtlasApp scanRoot -INFO: Scanning /tmp/junit-15560885133010516491 for JUnit files fqcn,method,loc,tags com.acme.tests.SampleOneTest,alpha,8,fast;crypto com.acme.tests.SampleOneTest,beta,6,param @@ -46,30 +76,264 @@ com.acme.tests.SampleOneTest,gamma,4,nested1;nested2 com.acme.other.AnotherTest,delta,3, ``` -### Plain text (`-plain`) +### Plain mode -* Prints one line per detected method: - * `FQCN, method, LOC=, TAGS=` -* If a method has **no tags**, it prints `TAGS=-` +Enable plain mode with `-plain`: -Example: +```bash +./methodatlas -plain /path/to/project +``` + +Plain mode renders one line per method: ```text -Feb 11, 2026 1:33:35 AM org.egothor.methodatlas.MethodAtlasApp scanRoot -INFO: Scanning /tmp/junit-12139245189413750595 for JUnit files com.acme.tests.SampleOneTest, alpha, LOC=8, TAGS=fast;crypto com.acme.tests.SampleOneTest, beta, LOC=6, TAGS=param com.acme.tests.SampleOneTest, gamma, LOC=4, TAGS=nested1;nested2 com.acme.other.AnotherTest, delta, LOC=3, TAGS=- ``` +If a method has no source-level JUnit tags, plain mode prints `TAGS=-`. + +## AI enrichment + +When AI support is enabled, MethodAtlas submits each parsed test class to a provider-agnostic suggestion engine and merges returned method-level suggestions into the emitted output. + +The AI subsystem can: + +- classify whether a test is security-relevant +- propose a `SECURITY: ...` display name +- assign controlled taxonomy tags +- provide a short rationale + +Supported providers: + +- `auto` +- `ollama` +- `openai` +- `openrouter` +- `anthropic` + +In `auto` mode, MethodAtlas prefers a reachable local Ollama instance and otherwise falls back to an OpenAI-compatible provider when an API key is configured. + +## Complete command-line arguments + +### General options + +| Argument | Meaning | Default | +| --- | --- | --- | +| `-plain` | Emit plain text instead of CSV | CSV mode | +| `[path ...]` | One or more root paths to scan | Current directory | + +### AI options + +| Argument | Meaning | Notes / default | +| --- | --- | --- | +| `-ai` | Enable AI enrichment | Disabled by default | +| `-ai-provider ` | Select provider | `auto`, `ollama`, `openai`, `openrouter`, `anthropic` | +| `-ai-model ` | Provider-specific model identifier | Default is `qwen2.5-coder:7b` | +| `-ai-base-url ` | Override provider base URL | Provider-specific default URL is used otherwise | +| `-ai-api-key ` | Supply API key directly on the command line | Useful for quick experiments; env vars are often preferable | +| `-ai-api-key-env ` | Read API key from an environment variable | Used if `-ai-api-key` is not supplied | +| `-ai-taxonomy ` | Load taxonomy text from an external file | Overrides built-in taxonomy text | +| `-ai-taxonomy-mode ` | Select built-in taxonomy mode | `default` or `optimized`; default is `default` | +| `-ai-max-class-chars ` | Skip AI analysis for larger classes | Default is `40000` | +| `-ai-timeout-sec ` | Set request timeout for provider calls | Default is `90` seconds | +| `-ai-max-retries ` | Set retry limit for AI operations | Default is `1` | + +Unknown options cause an error. Missing option values also fail fast. + +### Argument details + +#### `-plain` + +Switches output rendering from CSV to a human-readable line-oriented format. This affects rendering only; method discovery and AI classification behavior remain the same. + +#### `-ai` + +Turns on AI enrichment. Without this flag, MethodAtlas behaves as a pure static scanner and emits only source-derived metadata. When this flag is present, the application initializes an AI suggestion engine before scanning. + +#### `-ai-provider ` + +Selects the provider implementation. + +Accepted values are case-insensitive because the CLI normalizes them internally before mapping them to the provider enum. Available providers are: + +- `auto` +- `ollama` +- `openai` +- `openrouter` +- `anthropic` + +`auto` is the default. + +#### `-ai-model ` + +Specifies the provider-specific model name. Examples include local Ollama model names or hosted model identifiers accepted by OpenAI-compatible providers. The default is `qwen2.5-coder:7b`. + +#### `-ai-base-url ` + +Overrides the provider base URL. + +If omitted, MethodAtlas uses these defaults: + +| Provider | Default base URL | +| --- | --- | +| `auto` | `http://localhost:11434` | +| `ollama` | `http://localhost:11434` | +| `openai` | `https://api.openai.com` | +| `openrouter` | `https://openrouter.ai/api` | +| `anthropic` | `https://api.anthropic.com` | + +This is useful for self-hosted gateways, proxies, compatible endpoints, or non-default local deployments. + +#### `-ai-api-key ` + +Provides the API key directly. This takes precedence over `-ai-api-key-env` because the resolved API key logic first checks the explicit key and only then consults the environment variable. + +#### `-ai-api-key-env ` + +Reads the API key from an environment variable such as: + +```bash +export OPENROUTER_API_KEY=... +./methodatlas -ai -ai-provider openrouter -ai-api-key-env OPENROUTER_API_KEY /path/to/tests +``` + +If both `-ai-api-key` and `-ai-api-key-env` are omitted, providers that require hosted authentication will be unavailable. + +#### `-ai-taxonomy ` + +Loads taxonomy text from an external file instead of using the built-in taxonomy. This lets you tailor classification categories or rules to your own security testing conventions. + +#### `-ai-taxonomy-mode ` + +Selects one of the built-in taxonomy variants: + +- `default` — more descriptive, human-readable taxonomy +- `optimized` — more compact taxonomy intended to improve model reliability and reduce prompt size + +When `-ai-taxonomy` is also supplied, the external taxonomy file takes precedence. + +#### `-ai-max-class-chars ` + +Sets the maximum serialized class size eligible for AI analysis. If a class source exceeds this number of characters, MethodAtlas skips AI classification for that class and continues scanning normally. + +#### `-ai-timeout-sec ` + +Configures the timeout applied to AI provider requests. The default is 90 seconds. + +#### `-ai-max-retries ` + +Configures the retry count retained in AI runtime options. The current default is `1`. + +## Example commands + +Basic scan: + +```bash +./methodatlas /path/to/project +``` + +Plain output: + +```bash +./methodatlas -plain /path/to/project +``` + +AI with OpenRouter and direct API key: + +```bash +./methodatlas -ai -ai-provider openrouter -ai-api-key YOUR_API_KEY -ai-model stepfun/step-3.5-flash:free /path/to/junit/tests +``` + +AI with OpenRouter and environment variable: + +```bash +export OPENROUTER_API_KEY=YOUR_API_KEY +./methodatlas -ai -ai-provider openrouter -ai-api-key-env OPENROUTER_API_KEY -ai-model stepfun/step-3.5-flash:free /path/to/junit/tests +``` + +AI with local Ollama: + +```bash +./methodatlas -ai -ai-provider ollama -ai-model qwen2.5-coder:7b /path/to/junit/tests +``` + +Automatic provider selection: + +```bash +./methodatlas -ai /path/to/junit/tests +``` + +## Highlighted example: AI extension in action + +In a real packaged setup, running MethodAtlas from the unzipped distribution against a subset of MethodAtlas and ZeroEcho test sources with: + +```bash +./methodatlas -ai -ai-provider openrouter -ai-api-key OBTAIN_YOUR_API_KEY -ai-model stepfun/step-3.5-flash:free some/dir/with/junit/tests/ +``` + +produced output such as: + +```csv +fqcn,method,loc,tags,ai_security_relevant,ai_display_name,ai_tags,ai_reason +org.egothor.methodatlas.MethodAtlasAppTest,csvMode_detectsMethodsLocAndTags,22,,false,,,"Test verifies functional output format and data extraction of MethodAtlasApp, not security properties." +org.egothor.methodatlas.MethodAtlasAppTest,plainMode_detectsMethodsLocAndTags,20,,false,,,"Test verifies functional output format and data extraction of MethodAtlasApp, not security properties." +zeroecho.core.alg.aes.AesGcmCrossCheckTest,aesGcm_stream_vs_jca_ctxOnly_crosscheck,52,,true,SECURITY: crypto - cross-check AES-GCM stream encryption with JCA reference,security;crypto,"The test verifies that the custom AES-GCM stream implementation produces identical ciphertexts and plaintexts as the JCA reference, ensuring cryptographic correctness and preventing failures that could lead to loss of confidentiality or integrity." +zeroecho.core.alg.aes.AesLargeDataTest,aesGcmLargeData_ctxOnly,27,,true,SECURITY: crypto - AES-GCM round-trip with context-only parameters,security;crypto,"Tests encryption and decryption correctness for large data using AES-GCM, ensuring the authenticated encryption mechanism functions properly for confidentiality and integrity." +zeroecho.core.alg.aes.AesLargeDataTest,aesGcmLargeData_headerCodec,29,,true,SECURITY: crypto - AES-GCM round-trip with header codec,security;crypto,"Validates AES-GCM with an in-band header codec, confirming correct handling of additional authenticated data in the encryption process." +zeroecho.core.alg.aes.AesLargeDataTest,aesCbcPkcs5LargeData_ctxOnly,27,,true,SECURITY: crypto - AES-CBC/PKCS7Padding round-trip with context-only IV,security;crypto,"Ensures AES-CBC encryption and decryption with PKCS7 padding works correctly for large data, testing confidentiality without integrity protection." +zeroecho.core.alg.mldsa.MldsaLargeDataTest,mldsa_complete_suite_streaming_sign_verify_large_data,24,,true,SECURITY: crypto - ML-DSA streaming signature and verification for large data with integrity check,security;crypto;owasp,"Validates cryptographic correctness of ML-DSA signature creation and verification, including handling large data streams, signature length checks, and rejection of tampered signatures via bit-flip, ensuring data integrity and resistance to forgery." +``` + +What this shows in practice: + +- Functional tests remain untouched. +- Security-relevant cryptographic tests are detected correctly. +- The tool suggests consistent taxonomy tags such as `security`, `crypto`, and, where appropriate, `owasp`. +- The generated display names are already suitable as candidate `@DisplayName` values. +- The rationale column explains why a method was classified as security-relevant. + +For a programmer, this turns a raw test tree into a searchable, structured inventory of security tests without requiring manual tagging of every method. + +## Built-in security taxonomy + +The prompt builder enforces a closed tag set so that providers do not invent categories. The built-in taxonomy covers these security areas: + +- `auth` +- `access-control` +- `crypto` +- `input-validation` +- `injection` +- `data-protection` +- `logging` +- `error-handling` +- `owasp` + +Every security-relevant method must include the umbrella tag `security`, and suggested display names should follow: + +```text +SECURITY: - +``` + +MethodAtlas ships both a default taxonomy and a more compact optimized taxonomy. + +## Why this is useful + +MethodAtlas is useful when you need to: + +- inventory a large JUnit suite quickly +- find tests that already validate security properties +- identify where security tagging is inconsistent or missing +- export structured metadata for reporting, dashboards, or CI jobs +- review security test coverage before an audit or release + +Because the application emits one row per test method, the output is easy to pipe into shell scripts, spreadsheets, data pipelines, or further static analysis. + ## Notes -* The scanner looks for files ending with `*Test.java`. -* JUnit methods are detected by annotations such as: - * `@Test` - * `@ParameterizedTest` - * `@RepeatedTest` -* Tag extraction supports: - * `@Tag("x")` (including repeated `@Tag`) - * `@Tags({ @Tag("x"), @Tag("y") })` +- The scanner currently considers files ending with `*Test.java`. +- AI classification is class-contextual: the full class source is submitted so the model can classify methods with more context. +- If AI support is enabled but engine initialization fails, the application aborts. +- If AI classification of a particular class fails, the scan continues and MethodAtlas emits base metadata without AI suggestions for that class.