test-driven-development
npx machina-cli add skill CodingCossack/agent-skills-library/test-driven-development --openclawTest-Driven Development
Write test first. Watch it fail. Write minimal code to pass. Refactor.
Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.
The Iron Law
NO BEHAVIOR-CHANGING PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Wrote code before test? Delete it completely. Implement fresh from tests.
Refactoring is exempt: The refactor step changes structure, not behavior. Tests stay green throughout. No new failing test required.
Red-Green-Refactor Cycle
RED ──► Verify Fail ──► GREEN ──► Verify Pass ──► REFACTOR ──► Verify Pass ──► Next RED
│ │ │
▼ ▼ ▼
Wrong failure? Still failing? Broke tests?
Fix test, retry Fix code, retry Fix, retry
RED - Write Failing Test
Write one minimal test for one behavior.
Good example:
test('retries failed operations 3 times', async () => {
let attempts = 0;
const operation = async () => {
attempts++;
if (attempts < 3) throw new Error('fail');
return 'success';
};
const result = await retryOperation(operation);
expect(result).toBe('success');
expect(attempts).toBe(3);
});
Clear name, tests real behavior, asserts observable outcome
Bad example:
test('retry works', async () => {
const mock = jest.fn()
.mockRejectedValueOnce(new Error())
.mockRejectedValueOnce(new Error())
.mockResolvedValueOnce('success');
await retryOperation(mock);
expect(mock).toHaveBeenCalledTimes(3);
});
Vague name, asserts only call count without verifying outcome, tests mock mechanics not behavior
Requirements: One behavior. Clear name. Real code (mocks only if unavoidable).
Verify RED - Watch It Fail
MANDATORY. Never skip.
npm test path/to/test.test.ts
Test must go red for the right reason. Acceptable RED states:
- Assertion failure (expected behavior missing)
- Compile/type error (function doesn't exist yet)
Not acceptable: Runtime setup errors, import failures, environment issues.
Test passes immediately? You're testing existing behavior—fix test. Test errors for wrong reason? Fix error, re-run until it fails correctly.
GREEN - Minimal Code
Write simplest code to pass the test.
Good example:
async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
for (let i = 0; i < 3; i++) {
try {
return await fn();
} catch (e) {
if (i === 2) throw e;
}
}
throw new Error('unreachable');
}
Just enough to pass
Bad example:
async function retryOperation<T>(
fn: () => Promise<T>,
options?: { maxRetries?: number; backoff?: 'linear' | 'exponential'; }
): Promise<T> { /* YAGNI */ }
Over-engineered beyond test requirements
Write only what the test demands. No extra features, no "improvements."
Verify GREEN - Watch It Pass
MANDATORY.
npm test path/to/test.test.ts
Confirm: Test passes. All other tests still pass. Output pristine (no errors, warnings).
Test fails? Fix code, not test. Other tests fail? Fix now before continuing.
REFACTOR - Clean Up
After green only: Remove duplication. Improve names. Extract helpers.
Keep tests green throughout. Add no new behavior.
Repeat
Next failing test for next behavior.
Good Tests
Minimal: One thing per test. "and" in name? Split it. ❌ test('validates email and domain and whitespace')
Clear: Name describes behavior. ❌ test('test1')
Shows intent: Demonstrates desired API usage, not implementation details.
Example: Bug Fix
Bug: Empty email accepted
RED:
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
Verify RED:
$ npm test
FAIL: expected 'Email required', got undefined
GREEN:
function submitForm(data: FormData) {
if (!data.email?.trim()) {
return { error: 'Email required' };
}
// ...
}
Verify GREEN:
$ npm test
PASS
REFACTOR: Extract validation helper if pattern repeats.
Red Flags - STOP and Start Over
Any of these means delete code and restart with TDD:
- Code written before test
- Test passes immediately (testing existing behavior)
- Can't explain why test failed
- Rationalizing "just this once" or "this is different"
- Keeping code "as reference" while writing tests
- Claiming "tests after achieve the same purpose"
When Stuck
| Problem | Solution |
|---|---|
| Don't know how to test | Write the API you wish existed. Write assertion first. |
| Test too complicated | Design too complicated. Simplify the interface. |
| Must mock everything | Code too coupled. Introduce dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify design. |
Legacy Code (No Existing Tests)
The Iron Law ("delete and restart") applies to new code you wrote without tests. For inherited code with no tests, use characterization tests:
- Write tests that capture current behavior (even if "wrong")
- Run tests, observe actual outputs
- Update assertions to match reality (these are "golden masters")
- Now you have a safety net for refactoring
- Apply TDD for new behavior changes
Characterization tests lock down existing behavior so you can refactor safely. They're the on-ramp, not a permanent state.
Flakiness Rules
Tests must be deterministic. Ban these in unit tests:
- Real sleeps / delays → Use fake timers (
vi.useFakeTimers(),jest.useFakeTimers()) - Wall clock time → Inject clock, assert against injected time
- Math.random() → Seed or inject RNG
- Network calls → Mock at boundary or use MSW
- Filesystem race conditions → Use temp dirs with unique names
Flaky test? Fix or delete. Flaky tests erode trust in the entire suite.
Debugging Integration
Bug found? Write failing test reproducing it first. Then follow TDD cycle. Test proves fix and prevents regression.
Planning: Test List
Before diving into the cycle, spend 2 minutes listing the next 3-10 tests you expect to write. This prevents local-optimum design where early tests paint you into a corner.
Example test list for a retry function:
- retries N times on failure
- returns result on success
- throws after max retries exhausted
- calls onRetry callback between attempts
- respects backoff delay
Work through the list in order. Add/remove tests as you learn.
Testing Anti-Patterns
When writing tests involving mocks, dependencies, or test utilities: See references/testing-anti-patterns.md for common pitfalls including testing mock behavior and adding test-only methods to production classes.
Philosophy and Rationalizations
For detailed rebuttals to common objections ("I'll test after", "deleting work is wasteful", "TDD is dogmatic"): See references/tdd-philosophy.md
Final Rule
Production code exists → test existed first and failed first
Otherwise → not TDD
Source
git clone https://github.com/CodingCossack/agent-skills-library/blob/main/skills/test-driven-development/SKILL.mdView on GitHub Overview
Test-Driven Development (TDD) is a disciplined development methodology that starts with writing a failing test, then implementing the minimal production code to pass, and finally refactoring. The core idea is to prove behavior with tests before any production change, ensuring correctness and preventing regressions.
How This Skill Works
Start with RED: write a minimal test for a single behavior and run the test to confirm it fails. Then GREEN: implement the smallest amount of production code needed to pass the test, and re-run tests to confirm. Finally REFACTOR: clean up code and tests, improve readability, and remove duplication while keeping all tests green.
When to Use It
- Implementing a new feature where behavior must be proven by tests
- Fixing a bug where the bug must be reproduced and prevented by tests
- Refactoring a module or function without changing observable behavior
- Adding or changing behavior in a way that requires verified correctness
- Guarding against regressions in critical logic (e.g., retry or error handling)
Quick Start
- Step 1: Write a failing test for a single behavior before touching production code
- Step 2: Run tests to force a red state and confirm the right failure
- Step 3: Implement the minimal code to pass the test, then run tests and refactor if needed
Best Practices
- Write one behavior per test with clear, descriptive names
- Always run tests to observe the red state before coding
- Keep production code minimal—do only what the test demands (YAGNI)
- Refactor only after all tests pass and ensure tests stay green
- Use mocks or stubs to isolate behavior and keep tests deterministic
Example Use Cases
- Add a retry utility: write a failing test asserting three attempts and eventual success, then implement minimal retry logic to pass, and refactor if needed
- Bug fix: add a test that captures the failing scenario, verify red; implement fix with minimal changes to satisfy the test; run full suite
- Feature implementation: write a failing test for the new behavior, implement the smallest code to satisfy it, and refactor
- Refactor a module (e.g., rename functions, restructure tests) after ensuring tests remain green
- Guard against regressions across modules by iteratively adding focused tests before coding