383 lines
15 KiB
Markdown
383 lines
15 KiB
Markdown
# MethodAtlasApp
|
||
|
||
<img src="MethodAtlas.png" width="20%" align="right" alt="MethodAtlas logo" />
|
||
|
||
MethodAtlas is a small standalone CLI that scans Java source trees for JUnit 5 test methods and emits one record per discovered method.
|
||
|
||
The tool combines **deterministic source analysis** with optional **AI-assisted classification** so that developers can quickly understand what a test suite contains and which tests appear security-relevant.
|
||
|
||
Unlike tools that rely entirely on large language models or agent pipelines, MethodAtlas separates the problem into two parts:
|
||
|
||
- **Deterministic discovery** — a Java AST parser determines exactly which test methods exist
|
||
- **AI interpretation** — an optional model classifies those methods and suggests security-related annotations
|
||
|
||
This approach keeps the analysis **predictable, reproducible, and reviewable**, while still benefiting from AI where it adds value.
|
||
|
||
The parser determines *what exists* in the code.
|
||
The AI suggests *what it means*.
|
||
|
||
## What MethodAtlas reports
|
||
|
||
For each discovered JUnit test method, MethodAtlas emits a single record containing:
|
||
|
||
- `fqcn` – fully qualified class name
|
||
- `method` – test method name
|
||
- `loc` – inclusive lines of code for the method declaration
|
||
- `tags` – existing JUnit `@Tag` values declared on the method
|
||
|
||
When AI enrichment is enabled, additional fields are included:
|
||
|
||
- `ai_security_relevant` – whether the model classified the test as security-relevant
|
||
- `ai_display_name` – suggested security-oriented `@DisplayName`
|
||
- `ai_tags` – suggested security taxonomy tags
|
||
- `ai_reason` – short rationale for the classification
|
||
|
||
These suggestions help identify tests that verify authentication, access control, cryptography, input validation, or other security-relevant behavior.
|
||
|
||
## Deterministic method discovery
|
||
|
||
Test discovery is performed using **JavaParser** and the Java AST rather than regex scanning or LLM inference.
|
||
|
||
The CLI:
|
||
|
||
- scans files matching `*Test.java`
|
||
- detects JUnit Jupiter methods annotated with
|
||
`@Test`, `@ParameterizedTest`, or `@RepeatedTest`
|
||
- extracts existing tags from both repeated `@Tag` usage and `@Tags({...})`
|
||
|
||
Because the list of test methods is obtained from the AST, the analysis is **deterministic and reproducible** regardless of the AI provider used for classification.
|
||
|
||
## AI-assisted security classification
|
||
|
||
If AI mode is enabled, MethodAtlas sends the **full class source for context** together with the **exact list of parser-discovered test methods**.
|
||
|
||
The model is asked to classify only those methods and suggest:
|
||
|
||
- whether the test appears security-relevant
|
||
- consistent security taxonomy tags
|
||
- a meaningful security-oriented display name
|
||
|
||
This design avoids relying on AI to infer program structure and instead uses it only for semantic interpretation.
|
||
|
||
MethodAtlas supports multiple providers and can also run against **locally hosted models via Ollama**, allowing teams to use AI without exposing proprietary source code.
|
||
|
||
MethodAtlas is designed to be lightweight, deterministic, and easy to integrate into developer workflows or CI pipelines.
|
||
|
||
## Distribution layout
|
||
|
||
After building and packaging, the distribution archive has this structure:
|
||
|
||
```text
|
||
methodatlas-<version>/
|
||
├── bin/
|
||
│ ├── methodatlas
|
||
│ └── methodatlas.bat
|
||
└── lib/
|
||
└── methodatlas-<version>.jar
|
||
```
|
||
|
||
Run the CLI from the `bin` directory, for example:
|
||
|
||
```bash
|
||
cd methodatlas-<version>/bin
|
||
./methodatlas /path/to/project
|
||
```
|
||
|
||
## Usage
|
||
|
||
```bash
|
||
./methodatlas [options] [path1] [path2] ...
|
||
```
|
||
|
||
If no scan path is provided, the current directory is scanned. Multiple root paths are supported.
|
||
|
||
## Output modes
|
||
|
||
### CSV mode (default)
|
||
|
||
CSV mode prints a header followed by one record per discovered test method.
|
||
|
||
Without AI:
|
||
|
||
```text
|
||
fqcn,method,loc,tags
|
||
```
|
||
|
||
With AI:
|
||
|
||
```text
|
||
fqcn,method,loc,tags,ai_security_relevant,ai_display_name,ai_tags,ai_reason
|
||
```
|
||
|
||
Example:
|
||
|
||
```text
|
||
fqcn,method,loc,tags
|
||
com.acme.tests.SampleOneTest,alpha,8,fast;crypto
|
||
com.acme.tests.SampleOneTest,beta,6,param
|
||
com.acme.tests.SampleOneTest,gamma,4,nested1;nested2
|
||
com.acme.other.AnotherTest,delta,3,
|
||
```
|
||
|
||
### Plain mode
|
||
|
||
Enable plain mode with `-plain`:
|
||
|
||
```bash
|
||
./methodatlas -plain /path/to/project
|
||
```
|
||
|
||
Plain mode renders one line per method:
|
||
|
||
```text
|
||
com.acme.tests.SampleOneTest, alpha, LOC=8, TAGS=fast;crypto
|
||
com.acme.tests.SampleOneTest, beta, LOC=6, TAGS=param
|
||
com.acme.tests.SampleOneTest, gamma, LOC=4, TAGS=nested1;nested2
|
||
com.acme.other.AnotherTest, delta, LOC=3, TAGS=-
|
||
```
|
||
|
||
If a method has no source-level JUnit tags, plain mode prints `TAGS=-`.
|
||
|
||
## AI enrichment
|
||
|
||
When AI support is enabled, MethodAtlas submits each parsed test class to a provider-agnostic suggestion engine and merges returned method-level suggestions into the emitted output.
|
||
|
||
The AI subsystem can:
|
||
|
||
- classify whether a test is security-relevant
|
||
- propose a `SECURITY: ...` display name
|
||
- assign controlled taxonomy tags
|
||
- provide a short rationale
|
||
|
||
Supported providers:
|
||
|
||
- `auto`
|
||
- `ollama`
|
||
- `openai`
|
||
- `openrouter`
|
||
- `anthropic`
|
||
|
||
In `auto` mode, MethodAtlas prefers a reachable local Ollama instance and otherwise falls back to an OpenAI-compatible provider when an API key is configured.
|
||
|
||
## Complete command-line arguments
|
||
|
||
### General options
|
||
|
||
| Argument | Meaning | Default |
|
||
| --- | --- | --- |
|
||
| `-plain` | Emit plain text instead of CSV | CSV mode |
|
||
| `[path ...]` | One or more root paths to scan | Current directory |
|
||
|
||
### AI options
|
||
|
||
| Argument | Meaning | Notes / default |
|
||
| --- | --- | --- |
|
||
| `-ai` | Enable AI enrichment | Disabled by default |
|
||
| `-ai-provider <provider>` | Select provider | `auto`, `ollama`, `openai`, `openrouter`, `anthropic` |
|
||
| `-ai-model <model>` | Provider-specific model identifier | Default is `qwen2.5-coder:7b` |
|
||
| `-ai-base-url <url>` | Override provider base URL | Provider-specific default URL is used otherwise |
|
||
| `-ai-api-key <key>` | Supply API key directly on the command line | Useful for quick experiments; env vars are often preferable |
|
||
| `-ai-api-key-env <name>` | Read API key from an environment variable | Used if `-ai-api-key` is not supplied |
|
||
| `-ai-taxonomy <path>` | Load taxonomy text from an external file | Overrides built-in taxonomy text |
|
||
| `-ai-taxonomy-mode <mode>` | Select built-in taxonomy mode | `default` or `optimized`; default is `default` |
|
||
| `-ai-max-class-chars <count>` | Skip AI analysis for larger classes | Default is `40000` |
|
||
| `-ai-timeout-sec <seconds>` | Set request timeout for provider calls | Default is `90` seconds |
|
||
| `-ai-max-retries <count>` | Set retry limit for AI operations | Default is `1` |
|
||
|
||
Unknown options cause an error. Missing option values also fail fast.
|
||
|
||
### Argument details
|
||
|
||
#### `-plain`
|
||
|
||
Switches output rendering from CSV to a human-readable line-oriented format. This affects rendering only; method discovery and AI classification behavior remain the same.
|
||
|
||
#### `-ai`
|
||
|
||
Turns on AI enrichment. Without this flag, MethodAtlas behaves as a pure static scanner and emits only source-derived metadata. When this flag is present, the application initializes an AI suggestion engine before scanning.
|
||
|
||
#### `-ai-provider <provider>`
|
||
|
||
Selects the provider implementation.
|
||
|
||
Accepted values are case-insensitive because the CLI normalizes them internally before mapping them to the provider enum. Available providers are:
|
||
|
||
- `auto`
|
||
- `ollama`
|
||
- `openai`
|
||
- `openrouter`
|
||
- `anthropic`
|
||
|
||
`auto` is the default.
|
||
|
||
#### `-ai-model <model>`
|
||
|
||
Specifies the provider-specific model name. Examples include local Ollama model names or hosted model identifiers accepted by OpenAI-compatible providers. The default is `qwen2.5-coder:7b`.
|
||
|
||
#### `-ai-base-url <url>`
|
||
|
||
Overrides the provider base URL.
|
||
|
||
If omitted, MethodAtlas uses these defaults:
|
||
|
||
| Provider | Default base URL |
|
||
| --- | --- |
|
||
| `auto` | `http://localhost:11434` |
|
||
| `ollama` | `http://localhost:11434` |
|
||
| `openai` | `https://api.openai.com` |
|
||
| `openrouter` | `https://openrouter.ai/api` |
|
||
| `anthropic` | `https://api.anthropic.com` |
|
||
|
||
This is useful for self-hosted gateways, proxies, compatible endpoints, or non-default local deployments.
|
||
|
||
#### `-ai-api-key <key>`
|
||
|
||
Provides the API key directly. This takes precedence over `-ai-api-key-env` because the resolved API key logic first checks the explicit key and only then consults the environment variable.
|
||
|
||
#### `-ai-api-key-env <name>`
|
||
|
||
Reads the API key from an environment variable such as:
|
||
|
||
```bash
|
||
export OPENROUTER_API_KEY=...
|
||
./methodatlas -ai -ai-provider openrouter -ai-api-key-env OPENROUTER_API_KEY /path/to/tests
|
||
```
|
||
|
||
If both `-ai-api-key` and `-ai-api-key-env` are omitted, providers that require hosted authentication will be unavailable.
|
||
|
||
#### `-ai-taxonomy <path>`
|
||
|
||
Loads taxonomy text from an external file instead of using the built-in taxonomy. This lets you tailor classification categories or rules to your own security testing conventions.
|
||
|
||
#### `-ai-taxonomy-mode <mode>`
|
||
|
||
Selects one of the built-in taxonomy variants:
|
||
|
||
- `default` — more descriptive, human-readable taxonomy
|
||
- `optimized` — more compact taxonomy intended to improve model reliability and reduce prompt size
|
||
|
||
When `-ai-taxonomy` is also supplied, the external taxonomy file takes precedence.
|
||
|
||
#### `-ai-max-class-chars <count>`
|
||
|
||
Sets the maximum serialized class size eligible for AI analysis. If a class source exceeds this number of characters, MethodAtlas skips AI classification for that class and continues scanning normally.
|
||
|
||
#### `-ai-timeout-sec <seconds>`
|
||
|
||
Configures the timeout applied to AI provider requests. The default is 90 seconds.
|
||
|
||
#### `-ai-max-retries <count>`
|
||
|
||
Configures the retry count retained in AI runtime options. The current default is `1`.
|
||
|
||
## Example commands
|
||
|
||
Basic scan:
|
||
|
||
```bash
|
||
./methodatlas /path/to/project
|
||
```
|
||
|
||
Plain output:
|
||
|
||
```bash
|
||
./methodatlas -plain /path/to/project
|
||
```
|
||
|
||
AI with OpenRouter and direct API key:
|
||
|
||
```bash
|
||
./methodatlas -ai -ai-provider openrouter -ai-api-key YOUR_API_KEY -ai-model stepfun/step-3.5-flash:free /path/to/junit/tests
|
||
```
|
||
|
||
AI with OpenRouter and environment variable:
|
||
|
||
```bash
|
||
export OPENROUTER_API_KEY=YOUR_API_KEY
|
||
./methodatlas -ai -ai-provider openrouter -ai-api-key-env OPENROUTER_API_KEY -ai-model stepfun/step-3.5-flash:free /path/to/junit/tests
|
||
```
|
||
|
||
AI with local Ollama:
|
||
|
||
```bash
|
||
./methodatlas -ai -ai-provider ollama -ai-model qwen2.5-coder:7b /path/to/junit/tests
|
||
```
|
||
|
||
Automatic provider selection:
|
||
|
||
```bash
|
||
./methodatlas -ai /path/to/junit/tests
|
||
```
|
||
|
||
## Highlighted example: AI extension in action
|
||
|
||
In a real packaged setup, running MethodAtlas from the unzipped distribution against a subset of MethodAtlas and ZeroEcho test sources with:
|
||
|
||
```bash
|
||
./methodatlas -ai -ai-provider openrouter -ai-api-key OBTAIN_YOUR_API_KEY -ai-model stepfun/step-3.5-flash:free some/dir/with/junit/tests/
|
||
```
|
||
|
||
produced output such as:
|
||
|
||
```csv
|
||
fqcn,method,loc,tags,ai_security_relevant,ai_display_name,ai_tags,ai_reason
|
||
org.egothor.methodatlas.MethodAtlasAppTest,csvMode_detectsMethodsLocAndTags,22,,false,,,"Test verifies functional output format and data extraction of MethodAtlasApp, not security properties."
|
||
org.egothor.methodatlas.MethodAtlasAppTest,plainMode_detectsMethodsLocAndTags,20,,false,,,"Test verifies functional output format and data extraction of MethodAtlasApp, not security properties."
|
||
zeroecho.core.alg.aes.AesGcmCrossCheckTest,aesGcm_stream_vs_jca_ctxOnly_crosscheck,52,,true,SECURITY: crypto - cross-check AES-GCM stream encryption with JCA reference,security;crypto,"The test verifies that the custom AES-GCM stream implementation produces identical ciphertexts and plaintexts as the JCA reference, ensuring cryptographic correctness and preventing failures that could lead to loss of confidentiality or integrity."
|
||
zeroecho.core.alg.aes.AesLargeDataTest,aesGcmLargeData_ctxOnly,27,,true,SECURITY: crypto - AES-GCM round-trip with context-only parameters,security;crypto,"Tests encryption and decryption correctness for large data using AES-GCM, ensuring the authenticated encryption mechanism functions properly for confidentiality and integrity."
|
||
zeroecho.core.alg.aes.AesLargeDataTest,aesGcmLargeData_headerCodec,29,,true,SECURITY: crypto - AES-GCM round-trip with header codec,security;crypto,"Validates AES-GCM with an in-band header codec, confirming correct handling of additional authenticated data in the encryption process."
|
||
zeroecho.core.alg.aes.AesLargeDataTest,aesCbcPkcs5LargeData_ctxOnly,27,,true,SECURITY: crypto - AES-CBC/PKCS7Padding round-trip with context-only IV,security;crypto,"Ensures AES-CBC encryption and decryption with PKCS7 padding works correctly for large data, testing confidentiality without integrity protection."
|
||
zeroecho.core.alg.mldsa.MldsaLargeDataTest,mldsa_complete_suite_streaming_sign_verify_large_data,24,,true,SECURITY: crypto - ML-DSA streaming signature and verification for large data with integrity check,security;crypto;owasp,"Validates cryptographic correctness of ML-DSA signature creation and verification, including handling large data streams, signature length checks, and rejection of tampered signatures via bit-flip, ensuring data integrity and resistance to forgery."
|
||
```
|
||
|
||
What this shows in practice:
|
||
|
||
- Functional tests remain untouched.
|
||
- Security-relevant cryptographic tests are detected correctly.
|
||
- The tool suggests consistent taxonomy tags such as `security`, `crypto`, and, where appropriate, `owasp`.
|
||
- The generated display names are already suitable as candidate `@DisplayName` values.
|
||
- The rationale column explains why a method was classified as security-relevant.
|
||
|
||
For a programmer, this turns a raw test tree into a searchable, structured inventory of security tests without requiring manual tagging of every method.
|
||
|
||
## Built-in security taxonomy
|
||
|
||
The prompt builder enforces a closed tag set so that providers do not invent categories. The built-in taxonomy covers these security areas:
|
||
|
||
- `auth`
|
||
- `access-control`
|
||
- `crypto`
|
||
- `input-validation`
|
||
- `injection`
|
||
- `data-protection`
|
||
- `logging`
|
||
- `error-handling`
|
||
- `owasp`
|
||
|
||
Every security-relevant method must include the umbrella tag `security`, and suggested display names should follow:
|
||
|
||
```text
|
||
SECURITY: <security property> - <scenario>
|
||
```
|
||
|
||
MethodAtlas ships both a default taxonomy and a more compact optimized taxonomy.
|
||
|
||
## Why this is useful
|
||
|
||
MethodAtlas is useful when you need to:
|
||
|
||
- inventory a large JUnit suite quickly
|
||
- find tests that already validate security properties
|
||
- identify where security tagging is inconsistent or missing
|
||
- export structured metadata for reporting, dashboards, or CI jobs
|
||
- review security test coverage before an audit or release
|
||
|
||
Because the application emits one row per test method, the output is easy to pipe into shell scripts, spreadsheets, data pipelines, or further static analysis.
|
||
|
||
## Notes
|
||
|
||
- The scanner currently considers files ending with `*Test.java`.
|
||
- AI classification is class-contextual: the full class source is submitted so the model can classify methods with more context.
|
||
- If AI support is enabled but engine initialization fails, the application aborts.
|
||
- If AI classification of a particular class fails, the scan continues and MethodAtlas emits base metadata without AI suggestions for that class.
|