What are Beck's Test Desiderata?

A set of twelve test properties that tradeoffs among speed, isolation, determinism, readability, and other qualities. No single test maximizes all of them.

How do I apply the Testing Trophy?

Rank tests from end-to-end (few, slow, high confidence) down to static checks; aim for tests that resemble real usage and balance coverage with cost.

When should I use this skill?

When planning tests for a new feature or refactor, reviewing a test strategy, or assessing whether a codebase is well-tested.

testing

npx machina-cli add skill tslateman/duet/testing --openclaw

Files (1)

SKILL.md

7.4 KB

Test Design as Thinking

Overview

Test strategy, not test generation. Treat test design as an act of specification — articulate the contract, find the boundaries, surface hidden assumptions. Use Beck's Test Desiderata to make testing tradeoffs deliberate instead of accidental.

Beck's 12 Test Desiderata

Every test balances these properties. No test maximizes all twelve. The skill is knowing which to prioritize.

Property	Definition	Tension
Isolated	Same results regardless of run order	vs. speed (shared setup)
Composable	Test dimensions of variability separately	vs. writability (more tests)
Deterministic	Same results if nothing changes	vs. realism (real services)
Fast	Run quickly	vs. predictiveness (integration)
Writable	Cheap to write relative to code cost	vs. thoroughness
Readable	Comprehensible, invokes motivation	vs. conciseness
Behavioral	Sensitive to behavior changes	vs. structure-insensitivity
Structure-insensitive	Unaffected by refactoring	vs. behavioral sensitivity
Automated	No human intervention needed	vs. exploratory testing
Specific	Failure cause is obvious	vs. breadth of coverage
Predictive	Passing means production-ready	vs. speed and isolation
Inspiring	Passing builds confidence to deploy	vs. all other properties

See references/desiderata.md for application guidance.

Test Design Workflow

1. Articulate the Contract

Before writing any test, answer:

What does this code promise to callers?
What does it require from its inputs?
What side effects does it produce?
What invariants must always hold?

If you can't answer these, the code's contract is unclear. Fix that first.

2. Identify Boundaries

Every contract has edges. Test them:

Empty/zero/null — the degenerate case
One — the simplest non-empty case
Many — the normal case
Boundary — max values, off-by-one, type limits
Error — invalid input, unavailable dependencies
Concurrent — multiple callers, race conditions

3. Choose the Testing Approach

Match the approach to what you're testing:

Example-based tests — specific inputs and expected outputs. Best for known contracts with clear boundaries.

Property-based tests — invariants that hold for all inputs. Best for algorithms, parsers, serialization (encode/decode roundtrip), and sorting.

Integration tests — multiple components together. Best for verifying wiring, data flow, and contracts between modules.

Snapshot tests — output matches recorded baseline. Best for rendering, serialization, and configuration.

4. Apply the Testing Trophy

Kent C. Dodds' priority order:

         ┌──────┐
         │  E2E │  Few, slow, high confidence
        ┌┴──────┴┐
        │Integra-│  Most tests here
        │  tion  │
       ┌┴────────┴┐
       │   Unit   │  Many, fast, specific
      ┌┴──────────┴┐
      │   Static   │  Types, linters, formatters
      └────────────┘

"The more your tests resemble the way your software is used, the more confidence they can give you."

5. Evaluate Existing Tests

Ask of each test:

Which Desiderata properties does it maximize?
Which did it sacrifice? Was that deliberate?
Does it test behavior or implementation detail?
If this test fails, will the message tell you why?
If the implementation changes but behavior doesn't, does this test break? (It shouldn't)

Test Smells

Smell	Symptom	Fix
Testing implementation	Breaks on refactor, behavior unchanged	Test outputs, not internals
Tautological test	Repeats production logic in assertions	Test observable behavior
Happy path only	No error/boundary cases	Add boundary analysis
Flaky	Passes sometimes, fails sometimes	Fix nondeterminism or mark explicitly
Giant arrange	30 lines of setup for 1 assertion	Simplify the interface or use builders
Invisible assertion	`expect(result).toBeTruthy()`	Assert specific values
Test per method	One test per function, misses integration	Test use cases, not methods

Strategy Templates

For a pure function:

## Contract

[function name]: [input types] → [output type]

- Promises: [what it guarantees]
- Requires: [what inputs must satisfy]

## Test Cases

- [ ] Empty/zero input
- [ ] Single valid input
- [ ] Multiple valid inputs
- [ ] Boundary values
- [ ] Invalid inputs (error cases)
- [ ] Properties that hold for all inputs

For an API endpoint:

## Contract

[METHOD /path]: [request] → [response]

- Auth: [required/optional/none]
- Idempotent: [yes/no]

## Test Cases

- [ ] Happy path (valid request → expected response)
- [ ] Validation failures (400)
- [ ] Auth failures (401/403)
- [ ] Not found (404)
- [ ] Concurrent requests
- [ ] Rate limiting

For a UI component:

## Contract

[Component]: [props] → [rendered output + interactions]

## Test Cases

- [ ] Renders with required props
- [ ] Renders with all optional props
- [ ] User interactions trigger callbacks
- [ ] Loading/error/empty states
- [ ] Accessibility (keyboard nav, screen reader)

Output Format

When designing test strategy:

## Test Strategy for [feature/module]

### Contract

[What this code promises and requires]

### Priority Properties

[Which Desiderata properties matter most and why]

### Test Plan

1. [Test case] — [what it verifies] — [approach]
2. [Test case] — [what it verifies] — [approach]

### Tradeoffs Accepted

- [Property sacrificed] because [reason]

### Not Testing

- [What's deliberately excluded and why]

The Confidence Question

After designing the test suite, ask: "If all these tests pass, would you deploy with confidence?" If no, identify what's missing. If yes, stop — more tests beyond confidence are waste.

Overview

Design test strategy as a specification: articulate the contract, surface hidden assumptions, and surface tradeoffs using Beck's Test Desiderata. It helps plan tests for new features or refactors and review whether a codebase is well-tested.

How This Skill Works

Start by articulating the contract: what the code promises, required inputs, side effects, and invariants. Then identify boundaries (empty, one, many, boundary, error, concurrent). Next, choose a testing approach (example-based, property-based, integration, snapshot) and apply the Testing Trophy to prioritize test types (E2E, integration, unit, static).

When to Use It

Designing tests for a new feature or refactor and need a strategy.
Asked questions like 'how should I test this' or 'what tests do I need'.
Reviewing or auditing an existing test strategy for coverage, reliability, and maintainability.
Evaluating whether current tests are fast, deterministic, and well-isolated.
Planning test work that aligns with how users actually use the software (balancing tests across layers).

Quick Start

Step 1: Articulate the contract and invariants.
Step 2: Identify boundaries and edge cases (empty, one, many, boundary, error, concurrent).
Step 3: Choose testing approach and apply the Testing Trophy (Unit, Integration, End-to-End; add property-based and snapshot tests as needed).

Best Practices

Articulate the contract: what callers can rely on, input requirements, side effects, and invariants.
Identify boundaries: test Empty/zero/null, One, Many, Boundary, Error, and Concurrent scenarios.
Match testing approaches: example-based, property-based, integration, and snapshot tests.
Apply the Testing Trophy: prioritize end-to-end, integration, then unit tests; include static checks.
Continuously evaluate and improve tests against Beck's desiderata to avoid over/under-testing.

Example Use Cases

Design test strategy for a new API endpoint to validate contract, inputs, and side effects across unit and integration tests.
Refactor a module and reassess boundary tests, invariants, and test writability to maintain coverage.
Add property-based tests for a sorting algorithm to ensure invariants hold for arbitrary inputs.
Create snapshot tests to lock UI rendering output and detect regressions early.
Audit tests for a feature flag system to uncover race conditions and ensure correct behavior under concurrency.

Frequently Asked Questions

Add this skill to your agents

testing

Test Design as Thinking

Overview

Beck's 12 Test Desiderata

Test Design Workflow

1. Articulate the Contract

2. Identify Boundaries

3. Choose the Testing Approach

4. Apply the Testing Trophy

5. Evaluate Existing Tests

Test Smells

Strategy Templates

For a pure function:

For an API endpoint:

For a UI component:

Output Format

The Confidence Question

See Also

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions