Plugin Testing Standards
Use Cautionnpx machina-cli add skill LiorCohen/sdd/plugin-testing-standards --openclawPlugin Testing Standards
Testing methodology for Claude Code plugins ensuring deterministic verification of LLM-driven workflows.
Core Principles
1. Separation of Concerns
tests/
├── lib/ # All helper/utility code (wraps Node.js)
│ ├── index.ts # Re-exports everything
│ ├── paths.ts # Directory constants
│ ├── fs.ts # File system operations
│ ├── process.ts # Command execution
│ ├── claude.ts # Claude CLI helpers
│ └── http.ts # HTTP utilities
└── tests/ # Test files (NO direct node:* imports)
├── unit/ # No LLM required
├── workflows/ # LLM with deterministic verification
└── integration/ # Full functional verification
2. No Direct Node.js Imports in Tests
Test files must NOT import from node:* directly. All Node.js functionality is accessed through lib/ helpers.
// BAD - direct node import
import * as fs from 'node:fs';
import * as path from 'node:path';
// GOOD - use lib helpers
import { readFile, joinPath, fileExists } from '../lib/index.js';
3. File Size Limit: 300 Lines
If a test file exceeds 300 lines, split it into a directory with multiple smaller files.
tests/unit/large-feature.test.ts (350 lines - TOO BIG)
# Split into:
tests/unit/large-feature/
├── core.test.ts (~100 lines)
├── validation.test.ts (~120 lines)
└── integration.test.ts (~130 lines)
4. WHY Comments on Every Test
Every describe and it block must have a WHY comment explaining the business/technical value, not the mechanics.
/**
* WHY: Ensures scaffolding substitutes project name variables.
* Without this, generated projects have {{PROJECT_NAME}} literals
* in package.json, breaking npm install.
*/
it('substitutes {{PROJECT_NAME}} in templates', async () => { ... });
Test Tiers
| Tier | Name | LLM | Purpose | Duration |
|---|---|---|---|---|
| 1 | Unit | No | Test pure functions, templates, structure | < 10s |
| 2 | Workflow | Yes | Verify correct agent/skill invocations | < 15min |
| 3 | Integration | Yes | Verify generated output actually works | < 20min |
Tier 1: Unit Tests
Pure TypeScript tests with no Claude involved.
- Test scaffolding scripts directly
- Validate template files exist and have correct structure
- Validate plugin structure (commands, agents, skills)
- No network calls, no LLM
Tier 2: Workflow Tests
Run Claude with predefined inputs, parse output, verify invocations.
- Capture stream-json output
- Parse tool/skill/agent invocations
- Compare to expected behavior
- Deterministic pass/fail
Tier 3: Integration Tests
Verify generated output actually works.
- Run
npm installon generated projects - Run
npm run buildto verify TypeScript compiles - Start servers and verify they respond
- Tests expose real issues in scaffolding/templates
Deterministic LLM Testing
Approach
- Run Claude in non-interactive mode with predefined inputs
- Capture structured output via
--output-format stream-json - Parse tool/skill/agent invocations from JSON
- Compare to expected behavior defined in test specs
Stream-JSON Output Structure
{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Skill","input":{"skill":"init"}}]}}
{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Task","input":{"subagent_type":"spec-writer"}}]}}
Parsing Helper
interface ParsedOutput {
readonly toolUses: readonly ToolUse[];
readonly skillInvocations: readonly string[];
readonly agentInvocations: readonly string[];
}
const parseClaudeOutput = (output: string): ParsedOutput => {
const toolUses: ToolUse[] = [];
const skillInvocations: string[] = [];
const agentInvocations: string[] = [];
for (const line of output.split('\n')) {
try {
const event = JSON.parse(line);
if (event.type === 'assistant' && event.message?.content) {
for (const content of event.message.content) {
if (content.type === 'tool_use') {
toolUses.push({ name: content.name, input: content.input, id: content.id });
if (content.name === 'Skill') skillInvocations.push(content.input.skill);
if (content.name === 'Task') agentInvocations.push(content.input.subagent_type);
}
}
}
} catch { /* skip non-JSON */ }
}
return { toolUses, skillInvocations, agentInvocations };
};
Prompt Engineering for Determinism
Include these instructions in all automated test prompts:
THIS IS AN AUTOMATED TEST. You MUST:
1. Skip ALL discovery questions and use the values above
2. Skip approval steps - consider it PRE-APPROVED
3. Execute ALL steps through completion
4. Do NOT stop for user input at any point
5. Create ALL files in the CURRENT WORKING DIRECTORY (.) - do NOT use absolute paths
Integration Test Pattern
/**
* WHY: Verifies that init generates projects that actually compile.
* Catches issues like invalid TypeScript, missing dependencies, or
* broken import paths that would break users immediately.
*/
describe('init functional verification', () => {
/**
* WHY: npm install must succeed for users to run the project.
* Catches invalid package.json, missing dependencies, or
* dependency version conflicts.
*/
it('generated project installs dependencies', async () => {
const result = await runClaude(PROMPT, testDir, 300);
expect(result.exitCode).toBe(0);
const installResult = await runCommand('npm', ['install'], { cwd: projectDir });
expect(installResult.exitCode).toBe(0);
});
/**
* WHY: TypeScript must compile for the project to be usable.
* Catches type errors, missing type definitions, or invalid
* tsconfig settings in templates.
*/
it('generated project builds successfully', async () => {
const buildResult = await runCommand('npm', ['run', 'build'], { cwd: serverDir });
expect(buildResult.exitCode).toBe(0);
});
});
Critical Principle: Fix Code, Not Tests
If an integration test fails:
- The fix belongs in scaffolding.ts or templates
- NOT in test assertions or expectations
- Tests exist to catch real issues in generated output
# BAD: Weakening test to pass
- expect(buildResult.exitCode).toBe(0);
+ expect(buildResult.exitCode).toBeLessThan(2); // "allow warnings"
# GOOD: Fix the actual issue
# Edit scaffolding.ts or template files to fix the build error
Directory Structure Template
tests/{plugin-name}/
├── src/
│ ├── lib/
│ │ ├── index.ts
│ │ ├── paths.ts
│ │ ├── fs.ts
│ │ ├── process.ts
│ │ ├── claude.ts
│ │ └── http.ts
│ └── tests/
│ ├── unit/
│ │ └── {feature}/
│ │ └── {concern}.test.ts
│ ├── workflows/
│ │ ├── {command}.test.ts
│ │ └── {skill}/
│ │ └── {scenario}.test.ts
│ └── integration/
│ └── {command}-functional.test.ts
├── package.json
├── tsconfig.json
└── vitest.config.ts
NPM Scripts
{
"test": "vitest run",
"test:unit": "vitest run src/tests/unit/",
"test:workflows": "vitest run src/tests/workflows/",
"test:integration": "vitest run src/tests/integration/",
"test:ci": "vitest run src/tests/unit/",
"test:all": "vitest run"
}
Success Criteria
- Unit tests complete in < 10 seconds (no LLM)
- Workflow tests are deterministic - same input produces same pass/fail
- Integration tests verify generated projects actually build and run
- Failures clearly identify what invocation was missing or wrong
- Test failures indicate issues in plugin code, NOT in tests
- All test files are < 300 lines
- All test blocks have WHY comments
Source
git clone https://github.com/LiorCohen/sdd/blob/main/.claude/skills/plugin-testing-standards/SKILL.mdView on GitHub Overview
Defines a deterministic testing workflow for Claude Code plugins, codifying project structure, test boundaries, and proven methods to verify LLM-driven workflows. It enforces separation of concerns, avoids direct Node.js imports in tests, and uses stream-json output to compare actual invocations against expectations.
How This Skill Works
Tests live under tests/ with a lib/ helper layer and tiered test files (unit, workflow, integration). Claude runs in non-interactive mode with predefined inputs and --output-format stream-json; the emitted JSON is parsed to verify tool/skill/agent invocations against the spec.
When to Use It
- Validate scaffolding and plugin structure
- Verify deterministic LLM invocations in workflows
- Enforce test size limit and split large files
- Document WHY for every test to capture business value
- Run end-to-end integration checks of generated projects
Quick Start
- Step 1: Create tests/ structure with lib/ and unit/workflow/integration tests
- Step 2: Run Claude in non‑interactive mode with predefined inputs and --output-format stream-json
- Step 3: Parse JSON output and compare against expected invocations/specs
Best Practices
- Use lib/ helpers for all Node.js functionality
- Avoid direct node:* imports in tests
- Keep test files under 300 lines; split large files
- Add WHY comments to every describe/it describing business value
- Validate plugin structure (commands, agents, skills) and templates in unit tests
Example Use Cases
- Unit tests validating plugin scaffolding and template structure
- Workflow tests capturing and validating stream-json tool invocations
- Integration tests building and running a generated project
- GOOD vs BAD test patterns: using lib/ vs direct node imports
- Example of splitting a large test into multiple files when >300 lines