What does the Usefulness Score represent?

The Usefulness Score is calculated as Impact × Probability and guides whether a test should be kept, removed, or refactored.

Where is the audit report saved?

The consolidated audit report is written to docs/project/test_audit.md in the repository.

Is this workflow automated in the pipeline?

No, it is manually invoked by the user and is not part of the Story pipeline.

ln-630-test-auditor

Scanned

npx machina-cli add skill levnikolaevich/claude-code-skills/ln-630-test-auditor --openclaw

Files (1)

SKILL.md

17.2 KB

Paths: File paths (shared/, references/, ../ln-*) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

Test Suite Auditor (L2 Coordinator)

Coordinates comprehensive test suite audit across 6 quality categories using 5 specialized workers.

Purpose & Scope

L2 Coordinator that delegates to L3 specialized audit workers
Audits all tests against 6 quality categories (via 5 workers)
Calculates Usefulness Score for each test (Keep/Remove/Refactor)
Identifies missing tests for critical business logic
Detects anti-patterns and isolation issues
Aggregates results into unified report
Write report to docs/project/test_audit.md (file-based, no task creation)
Manual invocation by user; not part of Story pipeline

Core Philosophy

"Write tests. Not too many. Mostly integration." — Kent Beck "Test based on risk, not coverage." — ISO 29119

Key Principles:

Test business logic, not frameworks — bcrypt/Prisma/Express already tested
No performance/load/stress tests — Tests infrastructure, not code correctness (use k6/JMeter separately)
Risk-based prioritization — Priority ≥15 or remove
E2E for critical paths only — Money/Security/Data (Priority ≥20)
Usefulness over quantity — One useful test > 10 useless tests
Every test must justify existence — Impact × Probability ≥15

Workflow

Phase 1: Discovery (Automated)

Inputs: Codebase root directory

Actions:

Find all test files using Glob:
- **/*.test.* (Jest, Vitest)
- **/*.spec.* (Mocha, Jasmine)
- **/__tests__/**/* (Jest convention)
Parse test file structure (test names, assertions count)
Auto-discover Team ID from docs/tasks/kanban_board.md

Output: testFilesMetadata — list of test files with basic stats

Phase 2: Research Best Practices (ONCE)

Goal: Gather testing best practices context ONCE, share with all workers

Actions:

Use MCP Ref/Context7 to research testing best practices for detected tech stack
Load ../shared/references/risk_based_testing_guide.md
Build contextStore with:
- Testing philosophy (E2E primary, Unit supplementary)
- Usefulness Score formulas (Impact × Probability)
- Anti-patterns catalog
- Framework detection patterns

Add output_dir to contextStore:

{
  "output_dir": "docs/project/.audit/ln-630/{YYYY-MM-DD}"
}

Output: contextStore — shared context for all workers

Key Benefit: Context gathered ONCE → passed to all workers → token-efficient

Phase 2.5: Prepare Output Directory

mkdir -p {output_dir}   # No deletion — date folders preserve history

Phase 3: Domain Discovery (NEW)

Purpose: Detect project domains from production code folder structure for domain-aware coverage analysis.

Algorithm: (same as ln-360-codebase-auditor)

Priority 1: Explicit domain folders
- Check for: src/domains/*/, src/features/*/, src/modules/*/
- Monorepo patterns: packages/*/, libs/*/, apps/*/
- If found (>1 match) → use as domains
Priority 2: Top-level src/ folders*
- List folders: src/users/, src/orders/, src/payments/
- Exclude infrastructure: utils, shared, common, lib, helpers, config, types, interfaces, constants, middleware, infrastructure, core
- If remaining >1 → use as domains
Priority 3: Fallback to global mode
- If <2 domains detected → domain_mode = "global"
- All workers scan entire codebase (backward-compatible behavior)

Heuristics for domain detection:

Heuristic	Indicator	Example
File count	>5 files in folder	`src/users/` with 12 files
Structure	controllers/, services/, models/ present	MVC/Clean Architecture
Barrel export	index.ts/index.js exists	Module pattern
README	README.md describes domain	Domain documentation

Output:

{
  "domain_mode": "domain-aware",
  "all_domains": [
    {"name": "users", "path": "src/users", "file_count": 45},
    {"name": "orders", "path": "src/orders", "file_count": 32},
    {"name": "shared", "path": "src/shared", "file_count": 15, "is_shared": true}
  ]
}

Shared folder handling:

Folders named shared, common, utils, lib, core → mark is_shared: true
Shared code audited but grouped separately in report

Phase 4: Delegate to Workers

CRITICAL: All delegations use Task tool with subagent_type: "general-purpose" for context isolation.

Prompt template:

Task(description: "Test audit via ln-63X",
     prompt: "Execute ln-63X-{worker}. Read skill from ln-63X-{worker}/SKILL.md. Context: {contextStore}",
     subagent_type: "general-purpose")

Anti-Patterns:

❌ Direct Skill tool invocation without Task wrapper
❌ Any execution bypassing subagent context isolation

Phase 4a: Global Workers (PARALLEL)

Global workers scan entire test suite (not domain-aware):

#	Worker	Category	What It Audits
1	ln-631-test-business-logic-auditor	Business Logic Focus	Framework/Library tests (Prisma, Express, bcrypt, JWT, axios, React hooks) → REMOVE
2	ln-632-test-e2e-priority-auditor	E2E Priority	E2E baseline (2/endpoint), Pyramid validation, Missing E2E tests
3	ln-633-test-value-auditor	Risk-Based Value	Usefulness Score = Impact × Probability<br>Decisions: ≥15 KEEP, 10-14 REVIEW, <10 REMOVE
5	ln-635-test-isolation-auditor	Isolation + Anti-Patterns	Isolation (6 categories), Determinism, Anti-Patterns (7 types)

Invocation (4 workers in PARALLEL):

FOR EACH worker IN [ln-631, ln-632, ln-633, ln-635]:
  Task(description: "Test audit via " + worker,
       prompt: "Execute " + worker + ". Read skill. Context: " + JSON.stringify(contextStore),
       subagent_type: "general-purpose")

Phase 4b: Domain-Aware Worker (PARALLEL per domain)

Domain-aware worker runs once per domain:

#	Worker	Category	What It Audits
4	ln-634-test-coverage-auditor	Coverage Gaps	Missing tests for critical paths per domain (Money 20+, Security 20+, Data 15+, Core Flows 15+)

Invocation:

IF domain_mode == "domain-aware":
  FOR EACH domain IN all_domains:
    domain_context = {
      ...contextStore,
      domain_mode: "domain-aware",
      current_domain: { name: domain.name, path: domain.path }
    }
    Task(description: "Test audit coverage " + domain.name + " via ln-634",
         prompt: "Execute ln-634-test-coverage-auditor. Read skill. Context: " + JSON.stringify(domain_context),
         subagent_type: "general-purpose")
ELSE:
  // Fallback: invoke once for entire codebase (global mode)
  Task(description: "Test audit coverage via ln-634",
       prompt: "Execute ln-634-test-coverage-auditor. Read skill. Context: " + JSON.stringify(contextStore),
       subagent_type: "general-purpose")

Parallelism strategy:

Phase 4a: All 4 global workers run in PARALLEL
Phase 4b: All N domain-aware invocations run in PARALLEL
Example: 3 domains → 3 ln-634 invocations in single message

Worker Output Contract (File-Based):

Workers write full report to {output_dir}/{worker_id}.md per shared/templates/audit_worker_report_template.md.

Workers return minimal summary in-context (~50 tokens):

Report written: docs/project/.audit/ln-630/{YYYY-MM-DD}/631-business-logic.md
Score: 7.5/10 | Issues: 5 (C:0 H:2 M:2 L:1)

Coordinator extracts score/counts from return values. Full findings stay in files.

Unified Scoring Formula (all workers):

penalty = (critical × 2.0) + (high × 1.0) + (medium × 0.5) + (low × 0.2)
score = max(0, 10 - penalty)

Domain-aware workers add optional fields: domain, scan_path

Phase 5: Aggregate Results (File-Based)

Goal: Merge all worker results into unified Test Suite Audit Report

Workers wrote reports to {output_dir}/ and returned minimal summaries. Aggregation uses return values for numbers and file reads for findings tables.

Aggregation Algorithm:

1. Parse scores/counts from worker return strings (already in context)
2. Read worker report files from {output_dir}/ for findings tables
3. Sum severity counts across all workers
4. Calculate Overall Score: average of 5 worker scores
5. Sort findings by severity: CRITICAL → HIGH → MEDIUM → LOW
6. Group findings by category for report sections

Actions:

Parse return values from all workers for scores/counts
Read worker files from {output_dir}/ for detailed findings
Sum severity counts across all workers
Calculate overall score = average of 5 worker scores
Domain-aware worker (ln-634) → group by domain.name if domain_mode="domain-aware"
Generate Executive Summary (2-3 sentences)
Write report to docs/project/test_audit.md
Return summary to user

Findings grouping:

Categories 1-3, 5-6 (Business Logic, E2E, Value, Isolation, Anti-Patterns) → single tables (global)
Category 4 (Coverage Gaps) → subtables per domain (if domain_mode="domain-aware")

Context Validation (Post-Filter):

MANDATORY READ: Load shared/references/context_validation.md

Apply Rules 1, 5 + test-specific filters to merged findings:

FOR EACH finding WHERE severity IN (HIGH, MEDIUM):
  # Rule 1: ADR/Planned Override
  IF finding matches ADR → advisory "[Planned: ADR-XXX]"

  # Rule 5: Locality/Single-Consumer
  IF "extract shared helper" suggestion AND consumer_count == 1 → advisory

  # Test-specific: Custom wrapper detection
  IF "framework test" finding (ln-631) AND test imports custom wrapper class:
    → advisory (tests custom logic, not framework)

  # Test-specific: Setup/fixture code
  IF "The Liar" finding (ln-635) AND file is conftest/fixture/setup:
    → advisory (setup code, no assertions expected)

  # Test-specific: Parameterized test
  IF "The Giant" finding (ln-635) AND test is parameterized/data-driven:
    → severity -= 1 (size from data, not complexity)

Downgraded findings → "Advisory Findings" section in report.
Recalculate scores excluding advisory findings from penalty.

Exempt: Coverage gap CRITICAL findings (ln-634), risk-value scores (ln-633).

Output Format

## Test Suite Audit Report - [DATE]

### Executive Summary
[2-3 sentences: test suite health, major issues, key recommendations]

### Severity Summary

| Severity | Count |
|----------|-------|
| Critical | X |
| High | X |
| Medium | X |
| Low | X |
| **Total** | **X** |

### Compliance Score

| Category | Score | Notes |
|----------|-------|-------|
| Business Logic Focus | X/10 | X framework tests found |
| E2E Critical Coverage | X/10 | X critical paths missing E2E |
| Risk-Based Value | X/10 | X low-value tests |
| Coverage Gaps | X/10 | X critical paths untested |
| Isolation & Anti-Patterns | X/10 | X isolation + anti-pattern issues |
| **Overall** | **X/10** | Average of 5 categories |

### Domain Coverage Summary (NEW - if domain_mode="domain-aware")

| Domain | Critical Paths | Tested | Coverage % | Gaps |
|--------|---------------|--------|------------|------|
| users | 8 | 6 | 75% | 2 |
| orders | 12 | 8 | 67% | 4 |
| payments | 6 | 5 | 83% | 1 |
| **Total** | **26** | **19** | **73%** | **7** |

### Audit Findings

| Severity | Location | Issue | Principle | Recommendation | Effort |
|----------|----------|-------|-----------|----------------|--------|
| **CRITICAL** | routes/payment.ts:45 | Missing E2E for payment processing (Priority 25) | E2E Critical Coverage / Money Flow | Add E2E: successful payment + discount edge cases | M |
| **HIGH** | auth.test.ts:45-52 | Test 'bcrypt hashes password' validates library behavior | Business Logic Focus / Crypto Testing | Delete — bcrypt already tested by maintainers | S |
| **HIGH** | db.test.ts:78-85 | Test 'Prisma findMany returns array' validates ORM | Business Logic Focus / ORM Testing | Delete — Prisma already tested | S |
| **HIGH** | user.test.ts:45 | Anti-pattern 'The Liar' — no assertions | Anti-Patterns / The Liar | Add specific assertions or delete test | S |
| **MEDIUM** | utils.test.ts:23-27 | Test 'validateEmail' has Usefulness Score 4 | Risk-Based Value / Low Priority | Delete — likely covered by E2E registration | S |
| **MEDIUM** | order.test.ts:200-350 | Anti-pattern 'The Giant' — 150 lines | Anti-Patterns / The Giant | Split into focused tests | M |
| **LOW** | payment.test.ts | Anti-pattern 'Happy Path Only' — no error tests | Anti-Patterns / Happy Path | Add negative tests | M |

### Coverage Gaps by Domain (if domain_mode="domain-aware")

#### Domain: users (src/users/)

| Severity | Category | Missing Test | Location | Priority | Effort |
|----------|----------|--------------|----------|----------|--------|
| CRITICAL | Money | E2E: processRefund() | services/user.ts:120 | 20 | M |
| HIGH | Security | Unit: validatePermissions() | middleware/auth.ts:45 | 18 | S |

#### Domain: orders (src/orders/)

| Severity | Category | Missing Test | Location | Priority | Effort |
|----------|----------|--------------|----------|----------|--------|
| CRITICAL | Money | E2E: applyDiscount() | services/order.ts:45 | 25 | M |
| HIGH | Data | Integration: orderTransaction() | repositories/order.ts:78 | 16 | M |

Worker Architecture

Each worker:

Receives contextStore with testing best practices
Receives testFilesMetadata with test file list
Loads full test file contents when analyzing
Returns structured JSON with category findings
Operates independently (failure in one doesn't block others)

Token Efficiency:

Coordinator: metadata only (~1000 tokens)
Workers: full test file contents when needed (~5000-10000 tokens each)
Context gathered ONCE, shared with all workers

Critical Rules

Two-stage delegation: Global workers (4) + Domain-aware worker (ln-634 × N domains)
Domain discovery: Auto-detect domains from folder structure; fallback to global mode if <2 domains
Parallel execution: All workers (global + domain-aware) run in PARALLEL
Domain-grouped output: Coverage Gaps findings grouped by domain (if domain_mode="domain-aware")
Delete > Archive: Remove useless tests, don't comment out
E2E baseline: Every endpoint needs 2 E2E (positive + negative)
Justify each test: If can't explain Priority ≥15, remove it
Trust frameworks: Don't test Express/Prisma/bcrypt behavior
No performance/load tests: Flag and REMOVE tests measuring throughput/latency/memory (DevOps Epic territory)
Code is truth: If test contradicts code behavior, update test
Language preservation: Report in project's language (EN/RU)

Definition of Done

All test files discovered via Glob
Context gathered from testing best practices (MCP Ref/Context7)
Domain discovery completed (domain_mode determined)
contextStore built with test metadata + domain info
Global workers (4) invoked in PARALLEL
Domain-aware worker (ln-634) invoked per domain in PARALLEL
All workers completed successfully (or reported errors)
Results aggregated with domain grouping (if domain_mode="domain-aware")
Domain Coverage Summary built (if domain_mode="domain-aware")
Compliance scores calculated (6 categories)
Keep/Remove/Refactor decisions for each test
Missing tests identified with Priority (grouped by domain if applicable)
Anti-patterns catalogued
Report written to docs/project/test_audit.md
Summary returned to user

Reference Files

Orchestrator lifecycle: shared/references/orchestrator_pattern.md
Risk-based testing methodology: shared/references/risk_based_testing_guide.md
Task delegation pattern: shared/references/task_delegation_pattern.md
Audit scoring formula: shared/references/audit_scoring.md
Audit output schema: shared/references/audit_output_schema.md

Related Skills

Workers:
- ln-631-test-business-logic-auditor — Framework tests detection
- ln-632-test-e2e-priority-auditor — E2E baseline validation
- ln-633-test-value-auditor — Usefulness Score calculation
- ln-634-test-coverage-auditor — Coverage gaps identification
- ln-635-test-isolation-auditor — Isolation + Anti-Patterns
Reference:
- ../shared/references/risk_based_testing_guide.md — Risk-Based Testing Guide
- ../ln-620-codebase-auditor — Codebase audit coordinator (similar pattern)

Version: 4.0.0 Last Updated: 2025-12-23

Source

git clone https://github.com/levnikolaevich/claude-code-skills/blob/master/ln-630-test-auditor/SKILL.mdView on GitHub

Overview

ln-630-test-auditor coordinates comprehensive test suite audits across six quality categories by delegating to five specialized workers: Business Logic, E2E, Value, Coverage, and Isolation. It aggregates results into a single report and writes it to docs/project/test_audit.md, enabling risk-based prioritization and targeted improvements.

How This Skill Works

The coordinator begins with Phase 1 Discovery to enumerate test files using Glob patterns and parse test structures, and auto-discovers the Team ID. It then delegates work to five workers (Business Logic, E2E, Value, Coverage, Isolation), collects their results, computes a Usefulness Score using Impact × Probability, identifies missing tests and anti-patterns, and writes a unified report to docs/project/test_audit.md.

When to Use It

Audit a large test suite across six quality categories to identify gaps and risks.
Apply risk-based prioritization to decide which tests to keep, remove, or refactor.
Generate a consolidated audit report stored at docs/project/test_audit.md for stakeholders.
Identify missing tests for critical business logic and domains.
Detect anti-patterns and isolation issues during a coordinated audit.

Quick Start

Step 1: Run ln-630-test-auditor on the project root to start the coordinated audit.
Step 2: The tool performs Phase 1 Discovery, then delegates to five workers and computes Usefulness Scores.
Step 3: Open docs/project/test_audit.md to review the consolidated audit results and follow-up actions.

Best Practices

Test business logic, not frameworks; avoid re-testing bcrypt, Prisma, or Express if already covered elsewhere.
Do not include performance, load, or stress tests in this audit; separate those tests to dedicated tooling.
Prioritize by risk using the Usefulness Score; apply thresholds (e.g., Priority ≥ 15 to keep/remove/refactor).
Reserve E2E testing for critical paths (Priority ≥ 20) to maximize impact per test.
Ensure usefulness over quantity; every test should have Impact × Probability ≥ 15 and a clear justification.

Example Use Cases

A fintech project reveals only a handful of high-impact tests for money transfer flows; the audit adds targeted tests to reach broader coverage of critical paths.
Discovery phase surfaces Jest and Mocha patterns; audit results map test files to a metadata set for downstream scoring.
The contextStore is prepared via MCP Ref/Context7 to share testing philosophy and usefulness formulas across workers.
The final report is written to docs/project/test_audit.md and reviewed by QA leads for priority-based refactors.
Several low-usefulness tests are pruned or refactored, reducing maintenance costs while preserving coverage of critical paths.

Frequently Asked Questions

Add this skill to your agents