scan-halucinated-tests
npx machina-cli add skill nainishshafi/developer-productivity-skills/scan-halucinated-tests --openclawScan Hallucinated Tests
Cross-validate a test file against the real source code it tests. Detects phantom functions, non-existent classes and attributes, wrong mock targets, bad imports, fabricated exceptions, incorrect argument names, and invented constants — all hallucinations common when LLMs write tests.
Supported languages: Python (.py), Java (.java), C# (.cs), JavaScript/TypeScript (.js, .jsx, .ts, .tsx)
Analysis approach: A multi-language parser (parse-test-refs.py) extracts every external reference the test file makes using AST for Python and regex for Java/C#/JS. The trace-code-context skill's script provides the authoritative list of what actually exists in the source. A haiku subagent cross-references both to produce a scored hallucination report.
Prerequisites
- Python 3.8+ available (
.venvwill be created automatically if missing) - The skill requires a
<test-file>argument — relative or absolute path to the test file (.py,.java,.cs,.js,.ts, etc.) - Optional:
<source-file>argument — if omitted, the source path is inferred from the test filename
Workflow
Skill Step 1 — Gather Inputs and Infer Source Path
Accept two arguments from the user:
<test-file>(required) — path to the test file<source-file>(optional) — path to the source file being tested
If <source-file> is not provided, infer it using language-specific conventions (see references/scan-halucinated-tests-reference.md for the full table):
| Language | Test path example | Inferred source path |
|---|---|---|
| Python | tests/test_auth.py | src/auth.py |
| Python | tests/unit/test_auth.py | src/auth.py |
| Java | src/test/java/com/example/AuthTest.java | src/main/java/com/example/Auth.java |
| Java | AuthTest.java (flat) | Auth.java |
| C# | Tests/AuthTests.cs | src/Auth.cs |
| C# | AuthService.Tests/AuthServiceTests.cs | AuthService/AuthService.cs |
| JavaScript | auth.test.ts | auth.ts |
| JavaScript | __tests__/auth.js | src/auth.js |
If the inferred path does not exist on disk, tell the user and ask for <source-file> explicitly before continuing.
Skill Step 2 — Run trace-context.py on the Source File
Set up the .venv and run trace-context.py with --force to guarantee fresh ground truth:
[ -d .venv ] || python -m venv .venv
PYTHON=$(if [ -f .venv/Scripts/python ]; then echo .venv/Scripts/python; else echo .venv/bin/python; fi)
$PYTHON .github/skills/trace-code-context/scripts/trace-context.py --force "<source-file>"
The script prints a single JSON object to stdout:
{
"stale": true,
"repo_root": "/home/user/myproject",
"output_path": ".code-context/src/auth.md",
"source_path": "src/auth.py",
"language": "python",
"symbols": ["login", "logout", "AuthError"],
"imports": ["os", "hashlib", "models.user"],
"parse_method": "ast",
"repo_source_files": ["src/auth.py", "tests/test_auth.py"]
}
Capture: repo_root, source_path, output_path, language, symbols, imports, parse_method.
Stop and report to the user if: the script exits non-zero, the output contains no valid JSON, or the symbols list is empty (cross-reference would be meaningless).
Skill Step 3 — Run parse-test-refs.py on the Test File
The script auto-detects the language from the file extension.
$PYTHON .github/skills/scan-halucinated-tests/scripts/parse-test-refs.py "<test-file>"
The script prints a single JSON object to stdout. The shape is uniform across all languages:
{
"test_file": "tests/test_auth.py",
"inferred_source": "src/auth.py",
"language": "python",
"imports": [
{"module": "src.auth", "names": ["login", "logout"], "line": 1, "import_style": "from"}
],
"symbol_calls": [
{"name": "login", "line": 15, "call_style": "direct"}
],
"mock_targets": [
{"target": "src.auth.db_connection", "line": 10, "style": "decorator"}
],
"attribute_accesses": [
{"object": "result", "attribute": "token", "line": 18}
],
"exception_refs": [
{"name": "AuthError", "line": 22}
],
"kwarg_calls": [
{"function": "login", "kwargs": ["username", "password"], "line": 15}
],
"constant_refs": [
{"name": "MAX_RETRY_COUNT", "line": 30}
],
"parse_error": null
}
Language-specific mock_targets styles:
- Python:
"style": "decorator"/"inline"/"patch.object"—targetis a dotted string path likesrc.auth.db - Java:
"style": "field-annotation"/"mock-call"/"when-stub"/"verify"—targetis a class name or method name - C#:
"style": "mock-type"/"setup"/"verify"—targetis a type name or method name - JavaScript:
"style": "module-mock"/"spy-on"—targetis a module path orobject.method
Stop and report to the user if: parse_error is non-null, or the script exits with code 1 (display stderr).
Skill Step 4 — Launch Haiku Subagent for Cross-Reference Analysis
Use the Agent tool with:
- subagent_type:
"general-purpose" - model:
"haiku" - description:
"Scan hallucinated tests for <test-file>"
Compute the output file timestamp (current date-time in YYYYMMDD-HHMMSS format). The output path is:
.scan-test-results/scan-<timestamp>.md
Construct the following prompt using the JSON values from Steps 2 and 3, replacing all {...} placeholders with real values before sending:
You are a test hallucination auditor. Cross-reference what a test file *claims* exists against what *actually* exists in the source code, then write a scored hallucination report.
## Inputs
### Ground Truth — Source File (from trace-context.py)
- Repo root: {repo_root}
- Source path: {source_path}
- Context doc path: {output_path}
- Language: {language}
- Parse method: {parse_method}
- Defined symbols:
{symbols as bulleted list}
- Source imports:
{imports as bulleted list}
### Test File References (from parse-test-refs.py)
- Test file: {test_file}
- Test language: {test_language from parse-test-refs output}
- Imports by test:
{imports list formatted as: " - `{module}` imports {names} (line {line}, {import_style})"}
- Symbol calls:
{symbol_calls list formatted as: " - `{name}` (line {line}, {call_style})"}
- Mock targets:
{mock_targets list formatted as: " - `{target}` (line {line}, style={style})"}
- Attribute accesses:
{attribute_accesses list formatted as: " - `{object}.{attribute}` (line {line})"}
- Exception refs:
{exception_refs list formatted as: " - `{name}` (line {line})"}
- Keyword arg / named param calls:
{kwarg_calls list formatted as: " - `{function}({kwargs joined with ', '})` (line {line})"}
- Constant refs:
{constant_refs list formatted as: " - `{name}` (line {line})"}
## Output File
{repo_root}/.scan-test-results/scan-{timestamp}.md
---
## Your Instructions
1. **Read the source file** at `{repo_root}/{source_path}` using the Read tool.
This is authoritative ground truth of what exists. Read the full file carefully.
2. **Read the context doc** at `{repo_root}/{output_path}` using the Read tool.
Use the Defined Symbols table and Dependencies section as a quick-reference index.
3. **Read the test file** at `{repo_root}/{test_file}` using the Read tool.
Understand the full structure: test class/functions, mocks, stubs, assertions.
4. **Cross-reference each hallucination category.** The validation logic differs by language:
---
### Python
**Category 1 — Phantom Functions / Classes** (`symbol_calls`)
Each `name` must appear in the source's defined symbols. Skip pytest builtins
(`fixture`, `mark`, `raises`, `approx`), Python builtins (`len`, `str`, `open`, etc.),
`unittest.mock` names (`MagicMock`, `patch`, etc.), and names defined in the test file itself.
**Category 2 — Wrong Imports** (`imports`)
`imports[].module` must map to the source file's dotted module path
(`source_path` with `/` → `.` and `.py` stripped). `imports[].names` must be in `symbols[]`.
Skip stdlib and third-party imports (`os`, `sys`, `pytest`, `unittest`, `requests`, etc.).
**Category 3 — Wrong Mock Targets** (`mock_targets`, style=decorator/inline)
Split the dotted `target` string. The prefix must match the source module's dotted path.
The final segment must exist as an imported name, defined symbol, or attribute in the source.
`patch.object` style: first part is an imported name; second part must be in source symbols.
**Category 4 — Phantom Attributes** (`attribute_accesses`)
Only flag when the object is clearly a module or class from the source, and the attribute
demonstrably does not exist. Do NOT flag generic return-value attributes or Mock attrs.
**Category 5 — Phantom Exceptions** (`exception_refs`)
The name must be in source `symbols[]` or be a known Python stdlib exception.
Standard exceptions are NOT hallucinations: `ValueError`, `TypeError`, `KeyError`,
`RuntimeError`, `OSError`, `AttributeError`, `IndexError`, `NotImplementedError`, etc.
**Category 6 — Wrong Argument Names** (`kwarg_calls`)
Find the function definition in the source. Compare kwarg names to real parameter names.
Skip if the function accepts `**kwargs`.
**Category 7 — Fabricated Constants** (`constant_refs`)
ALL_CAPS name must appear in source text as a defined constant. Check raw source, not just symbols.
---
### Java
**Category 1 — Phantom Methods / Classes** (`symbol_calls`)
For `call_style=new`: the class name must be in source `symbols[]` or be an imported class.
For `call_style=static-or-instance` with an `object`: the `object` class must be imported
and the method `name` must be in source `symbols[]`. Skip JUnit/Mockito classes
(`Assert`, `Assertions`, `Mockito`, `ArgumentMatchers`, etc.) and Java stdlib (`String`, `List`, etc.).
**Category 2 — Wrong Imports** (`imports`)
The fully-qualified `module` (package) + `name` (class) must correspond to a class
actually used from the source. If the test imports `com.example.fakepackage.FakeClass`
that doesn't match any symbol in the source → hallucination.
**Category 3 — Wrong Mock Targets** (`mock_targets`)
- `style=field-annotation` or `style=mock-call`: the class being mocked must be imported
and should correspond to an interface/class the source depends on.
- `style=when-stub` or `style=verify`: the method name `target` must exist in source `symbols[]`
(i.e., it's a method the source exposes). A stubbed method that doesn't exist → hallucination.
**Category 5 — Phantom Exceptions** (`exception_refs`)
The exception class must be imported in the test or be a standard Java exception:
`IllegalArgumentException`, `IllegalStateException`, `NullPointerException`,
`RuntimeException`, `Exception`, `IOException`, `UnsupportedOperationException`, etc.
**Category 7 — Fabricated Constants** (`constant_refs`)
ALL_CAPS names must appear in the source as `public static final` fields or enum values.
---
### C#
**Category 1 — Phantom Types / Methods** (`symbol_calls`)
For `call_style=new`: the type name must be in source `symbols[]` or be an imported type.
For `call_style=type-call` with an `object`: the `object` type must be imported and `name`
must be in source `symbols[]`. Skip test framework types (`Assert`, `Mock`, `It`, `Times`, etc.)
and .NET stdlib types (`String`, `List`, `Task`, `Guid`, `DateTime`, etc.).
**Category 2 — Wrong Imports** (`imports`, style=using)
The namespace + type name must correspond to a type actually used from the source.
A `using` for a non-existent namespace → HIGH severity.
**Category 3 — Wrong Mock Targets** (`mock_targets`)
- `style=mock-type`: the mocked interface/class must be imported and should be a type
the source depends on.
- `style=setup` or `style=verify`: the method name `target` must be in source `symbols[]`.
**Category 5 — Phantom Exceptions** (`exception_refs`)
The exception class must be imported or be a standard .NET exception:
`Exception`, `ArgumentException`, `ArgumentNullException`, `InvalidOperationException`,
`NotImplementedException`, `NullReferenceException`, `IOException`, `HttpRequestException`, etc.
**Category 6 — Wrong Named Arguments** (`kwarg_calls`)
Find the method definition in the source. Named param names must match real parameter names.
---
### JavaScript / TypeScript
**Category 1 — Phantom Functions / Classes** (`symbol_calls`)
For `call_style=new`: the class must be in source `symbols[]` or be a known third-party class.
For `call_style=direct`: the function name must be in source `symbols[]`.
Skip test framework functions (`describe`, `it`, `test`, `expect`, `beforeEach`, etc.)
and JS globals (`Error`, `TypeError`, `JSON`, `Math`, `Promise`, `Array`, `Object`, etc.).
**Category 2 — Wrong Imports** (`imports`)
The `module` path (e.g., `./auth`, `../services/auth`) must resolve to a real file in the repo.
For relative paths: derive the expected file path from the test file's location + the module string.
`imports[].names` must be exported from that file (check source `symbols[]`).
Skip third-party package imports (no `./` prefix or relative path) — those are npm packages, not source.
**Category 3 — Wrong Mock Targets** (`mock_targets`)
- `style=module-mock`: the module path `target` must resolve to a real file.
- `style=spy-on`: the `object.method` — `object` must be an imported name and `method`
must exist in the source as an exported function or class method.
**Category 5 — Phantom Exceptions** (`exception_refs`)
The class must be in source `symbols[]` or be a built-in JS error:
`Error`, `TypeError`, `RangeError`, `ReferenceError`, `SyntaxError`, `URIError`, `EvalError`.
---
5. **Score each finding:**
- **CRITICAL** — Symbol/function/class/constant does not exist at all → will always fail at runtime
- **HIGH** — Wrong mock path, wrong import path, wrong kwarg name on a strict function
- **MEDIUM** — May pass silently but tests wrong behavior (attribute on Mock, wrong kwarg on variadic fn)
- **LOW** — Cannot confirm from static analysis; note as potential issue
6. **Create the output directory:**
```bash
mkdir -p "{repo_root}/.scan-test-results"
```
7. **Write the report** using the Write tool to `{repo_root}/.scan-test-results/scan-{timestamp}.md`:
```markdown
# Hallucination Scan Report
**Test file:** `{test_file}`
**Source file:** `{source_path}`
**Test language:** {language}
**Scanned:** {today's date and time}
**Parse method:** {parse_method}
## Summary
| Severity | Count |
|----------|-------|
| CRITICAL | N |
| HIGH | N |
| MEDIUM | N |
| LOW | N |
| **Total**| N |
**Overall verdict:** CLEAN / MINOR ISSUES / SIGNIFICANT HALLUCINATIONS / HEAVILY HALLUCINATED
_Verdict thresholds: CLEAN = 0 | MINOR ISSUES = only LOW/MEDIUM ≤ 3 | SIGNIFICANT = any HIGH or MEDIUM > 3 | HEAVILY HALLUCINATED = any CRITICAL or total > 6_
---
## Findings
### CRITICAL — {short title}
**Category:** {hallucination category name}
**Test file line:** {line number}
**What the test claims:** `{the symbol/call/import as written in the test}`
**What actually exists:** `{what is in the source, or "nothing — not defined"}`
**Why this fails:** {plain-English explanation, 1-2 sentences}
**Fix:** `{exact change needed in the test}`
---
(Repeat for each finding, ordered: CRITICAL → HIGH → MEDIUM → LOW)
---
## Verified (No Issues Found)
- `{symbol}` — confirmed defined in source
- `{import}` — confirmed valid path
(list all checked items that passed)
---
## Cannot Verify (Static Analysis Limit)
- `{item}` — {reason why static analysis cannot confirm or deny}
---
## Remediation Summary
{2-4 sentences summarizing the hallucination pattern and root cause}
```
Write the file, then confirm the absolute path in your response.
Skill Step 5 — Read and Present Report
- Read the file at
{repo_root}/.scan-test-results/scan-<timestamp>.md - Present to the user:
- The Summary table (severity counts + overall verdict)
- All CRITICAL and HIGH findings in full
- A note on MEDIUM/LOW count with offer to show details
- The Remediation Summary paragraph
- Full report path:
"Full report written to {repo_root}/.scan-test-results/scan-<timestamp>.md"
Additional Resources
references/scan-halucinated-tests-reference.md— Hallucination pattern catalog (all languages), source file inference rules, cross-reference heuristics per language, severity scoring, false-positive categories, output path conventionscripts/parse-test-refs.py— Multi-language test reference extractor; auto-detects Python/Java/C#/JavaScript from file extension; outputs uniform JSON to stdout
Source
git clone https://github.com/nainishshafi/developer-productivity-skills/blob/master/.github/skills/scan-halucinated-tests/SKILL.mdView on GitHub Overview
The scan-halucinated-tests skill cross-validates a test file against the real source it tests to detect phantom symbols, non-existent classes and attributes, wrong mock targets, and fabricated constants. It supports Python, Java, C#, and JavaScript/TypeScript, helping ensure tests stay aligned with source code and mitigates LLMS-induced errors.
How This Skill Works
The workflow uses parse-test-refs.py to extract every external reference from the test file (via AST for Python and regex for Java/C#/JS). It then leverages the trace-code-context output from the trace-context tool as the authoritative source map, and a haiku subagent cross-references both sources to produce a scored hallucination report.
When to Use It
- When you want to verify a test file actually matches the source it tests
- When you suspect phantom functions, non-existent classes, wrong mocks, or fabricated constants in tests
- When validating tests across multiple languages (Python, Java, C#, JavaScript/TypeScript)
- When preparing PRs or releases and you need an audit of test-source alignment
- When you want a reproducible report detailing test hallucinations and suggested fixes
Quick Start
- Step 1: Run with a <test-file> and optional <source-file>; if <source-file> is omitted, rely on language-specific inference rules
- Step 2: Execute the trace-context workflow on the provided or inferred <source-file> to gather ground-truth symbols and imports
- Step 3: Run parse-test-refs.py on the test file and review the generated hallucination report for mismatches and fixes
Best Practices
- Always provide the test file and, if available, the source file; if the source isn’t provided, rely on language-specific inference rules
- If inference fails, explicitly supply <source-file> to proceed
- Review the hallucination report and verify flagged items against the actual source to avoid false positives
- Integrate the scan into CI to catch misalignments before merges
- Keep mocks, imports, and constants in tests tightly aligned with their source definitions
Example Use Cases
- Python test references a function that does not exist in the source; the scan flags the phantom symbol and pinpoints the mismatch
- Java tests reference a wrong mock target; the report highlights the incorrect import or class reference
- JavaScript/TypeScript tests import a non-existent module or symbol; the tool flags the invalid reference
- Test asserts on a constant introduced by the model rather than present in the source; flagged as hallucination
- Audit reveals a renamed symbol not updated in tests; the report helps you track and fix the drift