What is Test-Driven Development?

TDD is the practice of writing a failing test first, then implementing code to pass that test, ensuring behavior is verifiable by tests.

How should I structure tests?

Use a test hierarchy: Level 1 for unit tests of pure logic, Level 2 for integration tests; keep tests close to source and fast to run.

test-driven-development

Scanned

npx machina-cli add skill addyosmani/agent-skills/test-driven-development --openclaw

Files (1)

SKILL.md

9.5 KB

Test-Driven Development

Overview

Write a failing test before writing the code that makes it pass. For bug fixes, reproduce the bug with a test before attempting a fix. Tests are proof — "seems right" is not done. A codebase with good tests is an AI agent's superpower; a codebase without tests is a liability.

When to Use

Implementing any new logic or behavior
Fixing any bug (the Prove-It Pattern)
Modifying existing functionality
Adding edge case handling
Any change that could break existing behavior

When NOT to use: Pure configuration changes, documentation updates, or static content changes that have no behavioral impact.

The TDD Cycle

    RED                GREEN              REFACTOR
 Write a test    Write minimal code    Clean up the
 that fails  ──→  to make it pass  ──→  implementation  ──→  (repeat)
      │                  │                    │
      ▼                  ▼                    ▼
   Test FAILS        Test PASSES         Tests still PASS

Step 1: RED — Write a Failing Test

Write the test first. It must fail. A test that passes immediately proves nothing.

// RED: This test fails because createTask doesn't exist yet
describe('TaskService', () => {
  it('creates a task with title and default status', async () => {
    const task = await taskService.createTask({ title: 'Buy groceries' });

    expect(task.id).toBeDefined();
    expect(task.title).toBe('Buy groceries');
    expect(task.status).toBe('pending');
    expect(task.createdAt).toBeInstanceOf(Date);
  });
});

Step 2: GREEN — Make It Pass

Write the minimum code to make the test pass. Don't over-engineer:

// GREEN: Minimal implementation
export async function createTask(input: { title: string }): Promise<Task> {
  const task = {
    id: generateId(),
    title: input.title,
    status: 'pending' as const,
    createdAt: new Date(),
  };
  await db.tasks.insert(task);
  return task;
}

Step 3: REFACTOR — Clean Up

With tests green, improve the code without changing behavior:

Extract shared logic
Improve naming
Remove duplication
Optimize if necessary

Run tests after every refactor step to confirm nothing broke.

The Prove-It Pattern (Bug Fixes)

When a bug is reported, do not start by trying to fix it. Start by writing a test that reproduces it.

Bug report arrives
       │
       ▼
  Write a test that demonstrates the bug
       │
       ▼
  Test FAILS (confirming the bug exists)
       │
       ▼
  Implement the fix
       │
       ▼
  Test PASSES (proving the fix works)
       │
       ▼
  Run full test suite (no regressions)

Example:

// Bug: "Completing a task doesn't update the completedAt timestamp"

// Step 1: Write the reproduction test (it should FAIL)
it('sets completedAt when task is completed', async () => {
  const task = await taskService.createTask({ title: 'Test' });
  const completed = await taskService.completeTask(task.id);

  expect(completed.status).toBe('completed');
  expect(completed.completedAt).toBeInstanceOf(Date);  // This fails → bug confirmed
});

// Step 2: Fix the bug
export async function completeTask(id: string): Promise<Task> {
  return db.tasks.update(id, {
    status: 'completed',
    completedAt: new Date(),  // This was missing
  });
}

// Step 3: Test passes → bug fixed, regression guarded

Test Hierarchy

Write tests at the lowest level that captures the behavior:

Level 1: Unit Tests
├── Pure logic, isolated functions
├── No I/O, no network, no database
├── Fast: milliseconds per test
├── Live next to source code: Button.test.tsx
└── Example: "formatDate returns ISO string for valid dates"

Level 2: Integration Tests
├── Component interactions, API boundaries
├── May use test database, mock services
├── Medium speed: seconds per test
├── Live next to source code or in tests/
└── Example: "POST /api/tasks creates a task in the database"

Level 3: End-to-End Tests
├── Full user flows through the real application
├── Uses browser automation (Playwright, Cypress)
├── Slow: 10+ seconds per test
├── Live in e2e/ or specs/ directory
└── Example: "User can register, create a task, and mark it complete"

Decision guide:

Is it pure logic with no side effects?
  → Unit test

Does it cross a boundary (API, database, file system)?
  → Integration test

Is it a critical user flow that must work end-to-end?
  → E2E test (limit these to critical paths)

Writing Good Tests

Test Behavior, Not Implementation

// Good: Tests what the function does
it('returns tasks sorted by creation date, newest first', async () => {
  const tasks = await listTasks({ sortBy: 'createdAt', sortOrder: 'desc' });
  expect(tasks[0].createdAt.getTime())
    .toBeGreaterThan(tasks[1].createdAt.getTime());
});

// Bad: Tests how the function works internally
it('calls db.query with ORDER BY created_at DESC', async () => {
  await listTasks({ sortBy: 'createdAt', sortOrder: 'desc' });
  expect(db.query).toHaveBeenCalledWith(
    expect.stringContaining('ORDER BY created_at DESC')
  );
});

Use the Arrange-Act-Assert Pattern

it('marks overdue tasks when deadline has passed', () => {
  // Arrange: Set up the test scenario
  const task = createTask({
    title: 'Test',
    deadline: new Date('2025-01-01'),
  });

  // Act: Perform the action being tested
  const result = checkOverdue(task, new Date('2025-01-02'));

  // Assert: Verify the outcome
  expect(result.isOverdue).toBe(true);
});

One Assertion Per Concept

// Good: Each test verifies one behavior
it('rejects empty titles', () => { ... });
it('trims whitespace from titles', () => { ... });
it('enforces maximum title length', () => { ... });

// Bad: Everything in one test
it('validates titles correctly', () => {
  expect(() => createTask({ title: '' })).toThrow();
  expect(createTask({ title: '  hello  ' }).title).toBe('hello');
  expect(() => createTask({ title: 'a'.repeat(256) })).toThrow();
});

Name Tests Descriptively

// Good: Reads like a specification
describe('TaskService.completeTask', () => {
  it('sets status to completed and records timestamp', ...);
  it('throws NotFoundError for non-existent task', ...);
  it('is idempotent — completing an already-completed task is a no-op', ...);
  it('sends notification to task assignee', ...);
});

// Bad: Vague names
describe('TaskService', () => {
  it('works', ...);
  it('handles errors', ...);
  it('test 3', ...);
});

Test Anti-Patterns to Avoid

Anti-Pattern	Problem	Fix
Testing implementation details	Tests break when refactoring even if behavior is unchanged	Test inputs and outputs, not internal structure
Flaky tests (timing, order-dependent)	Erode trust in the test suite	Use deterministic assertions, isolate test state
Testing framework code	Wastes time testing third-party behavior	Only test YOUR code
Snapshot abuse	Large snapshots nobody reviews, break on any change	Use snapshots sparingly and review every change
No test isolation	Tests pass individually but fail together	Each test sets up and tears down its own state
Mocking everything	Tests pass but production breaks	Mock at boundaries, not everywhere

When to Use Subagents for Testing

For complex bug fixes, spawn a subagent to write the reproduction test:

Main agent: "Spawn a subagent to write a test that reproduces this bug:
[bug description]. The test should fail with the current code."

Subagent: Writes the reproduction test

Main agent: Verifies the test fails, then implements the fix,
then verifies the test passes.

This separation ensures the test is written without knowledge of the fix, making it more robust.

Common Rationalizations

Rationalization	Reality
"I'll write tests after the code works"	You won't. And tests written after the fact test implementation, not behavior.
"This is too simple to test"	Simple code gets complicated. The test documents the expected behavior.
"Tests slow me down"	Tests slow you down now. They speed you up every time you change the code later.
"I tested it manually"	Manual testing doesn't persist. Tomorrow's change might break it with no way to know.
"The code is self-explanatory"	Tests ARE the specification. They document what the code should do, not what it does.
"It's just a prototype"	Prototypes become production code. Tests from day one prevent the "test debt" crisis.

Red Flags

Writing code without any corresponding tests
Tests that pass on the first run (they may not be testing what you think)
"All tests pass" but no tests were actually run
Bug fixes without reproduction tests
Tests that test framework behavior instead of application behavior
Test names that don't describe the expected behavior
Skipping tests to make the suite pass

Verification

After completing any implementation:

Every new behavior has a corresponding test
All tests pass: npm test
Bug fixes include a reproduction test that failed before the fix
Test names describe the behavior being verified
No tests were skipped or disabled
Coverage hasn't decreased (if tracked)

Source

git clone https://github.com/addyosmani/agent-skills/blob/main/skills/test-driven-development/SKILL.mdView on GitHub

Overview

Test-Driven Development (TDD) means writing a failing test before the code that makes it pass. It ensures bug fixes and new features are backed by tests, turning behavior into verifiable proofs rather than assumptions.

How This Skill Works

Follow the RED-GREEN-REFACTOR cycle: write a failing test, implement the minimal code to pass, then refactor without changing behavior. For bugs, use the Prove-It pattern: reproduce the issue with a failing test, fix, and confirm with tests.

When to Use It

Implementing any new logic or behavior
Fixing any bug (the Prove-It Pattern)
Modifying existing functionality
Adding edge case handling
Any change that could break existing behavior

Quick Start

Step 1: Write a failing test for the desired behavior
Step 2: Implement the minimal code to make the test pass
Step 3: Refactor while keeping tests green and run full tests

Best Practices

Write the failing test first (RED) before implementing logic
For bugs, reproduce with a test that demonstrates the issue (Prove-It)
Keep tests small, fast, and focused on a single behavior
Refactor after green tests to improve structure without changing behavior
Run the full test suite after each change to guard against regressions

Example Use Cases

Task creation test: ensure creating a task with a title sets a default status and records createdAt
Bug reproduction: write a test that fails when completing a task because completedAt isn't set
Bug fix: implement completeTask to set both status and completedAt
Refactor step: extract shared task-creation logic without altering behavior
Regression guard: add an integration test to prevent future changes from breaking behavior

Frequently Asked Questions

Add this skill to your agents

test-driven-development

Test-Driven Development

Overview

When to Use

The TDD Cycle

Step 1: RED — Write a Failing Test

Step 2: GREEN — Make It Pass

Step 3: REFACTOR — Clean Up

The Prove-It Pattern (Bug Fixes)

Test Hierarchy

Writing Good Tests

Test Behavior, Not Implementation

Use the Arrange-Act-Assert Pattern

One Assertion Per Concept

Name Tests Descriptively

Test Anti-Patterns to Avoid

When to Use Subagents for Testing

Common Rationalizations

Red Flags

Verification

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What is Test-Driven Development?

When should I not use TDD?

How should I structure tests?