testing
npx machina-cli add skill tslateman/duet/testing --openclawTest Design as Thinking
Overview
Test strategy, not test generation. Treat test design as an act of specification — articulate the contract, find the boundaries, surface hidden assumptions. Use Beck's Test Desiderata to make testing tradeoffs deliberate instead of accidental.
Beck's 12 Test Desiderata
Every test balances these properties. No test maximizes all twelve. The skill is knowing which to prioritize.
| Property | Definition | Tension |
|---|---|---|
| Isolated | Same results regardless of run order | vs. speed (shared setup) |
| Composable | Test dimensions of variability separately | vs. writability (more tests) |
| Deterministic | Same results if nothing changes | vs. realism (real services) |
| Fast | Run quickly | vs. predictiveness (integration) |
| Writable | Cheap to write relative to code cost | vs. thoroughness |
| Readable | Comprehensible, invokes motivation | vs. conciseness |
| Behavioral | Sensitive to behavior changes | vs. structure-insensitivity |
| Structure-insensitive | Unaffected by refactoring | vs. behavioral sensitivity |
| Automated | No human intervention needed | vs. exploratory testing |
| Specific | Failure cause is obvious | vs. breadth of coverage |
| Predictive | Passing means production-ready | vs. speed and isolation |
| Inspiring | Passing builds confidence to deploy | vs. all other properties |
See references/desiderata.md for application guidance.
Test Design Workflow
1. Articulate the Contract
Before writing any test, answer:
- What does this code promise to callers?
- What does it require from its inputs?
- What side effects does it produce?
- What invariants must always hold?
If you can't answer these, the code's contract is unclear. Fix that first.
2. Identify Boundaries
Every contract has edges. Test them:
- Empty/zero/null — the degenerate case
- One — the simplest non-empty case
- Many — the normal case
- Boundary — max values, off-by-one, type limits
- Error — invalid input, unavailable dependencies
- Concurrent — multiple callers, race conditions
3. Choose the Testing Approach
Match the approach to what you're testing:
Example-based tests — specific inputs and expected outputs. Best for known contracts with clear boundaries.
Property-based tests — invariants that hold for all inputs. Best for algorithms, parsers, serialization (encode/decode roundtrip), and sorting.
Integration tests — multiple components together. Best for verifying wiring, data flow, and contracts between modules.
Snapshot tests — output matches recorded baseline. Best for rendering, serialization, and configuration.
4. Apply the Testing Trophy
Kent C. Dodds' priority order:
┌──────┐
│ E2E │ Few, slow, high confidence
┌┴──────┴┐
│Integra-│ Most tests here
│ tion │
┌┴────────┴┐
│ Unit │ Many, fast, specific
┌┴──────────┴┐
│ Static │ Types, linters, formatters
└────────────┘
"The more your tests resemble the way your software is used, the more confidence they can give you."
5. Evaluate Existing Tests
Ask of each test:
- Which Desiderata properties does it maximize?
- Which did it sacrifice? Was that deliberate?
- Does it test behavior or implementation detail?
- If this test fails, will the message tell you why?
- If the implementation changes but behavior doesn't, does this test break? (It shouldn't)
Test Smells
| Smell | Symptom | Fix |
|---|---|---|
| Testing implementation | Breaks on refactor, behavior unchanged | Test outputs, not internals |
| Tautological test | Repeats production logic in assertions | Test observable behavior |
| Happy path only | No error/boundary cases | Add boundary analysis |
| Flaky | Passes sometimes, fails sometimes | Fix nondeterminism or mark explicitly |
| Giant arrange | 30 lines of setup for 1 assertion | Simplify the interface or use builders |
| Invisible assertion | expect(result).toBeTruthy() | Assert specific values |
| Test per method | One test per function, misses integration | Test use cases, not methods |
Strategy Templates
For a pure function:
## Contract
[function name]: [input types] → [output type]
- Promises: [what it guarantees]
- Requires: [what inputs must satisfy]
## Test Cases
- [ ] Empty/zero input
- [ ] Single valid input
- [ ] Multiple valid inputs
- [ ] Boundary values
- [ ] Invalid inputs (error cases)
- [ ] Properties that hold for all inputs
For an API endpoint:
## Contract
[METHOD /path]: [request] → [response]
- Auth: [required/optional/none]
- Idempotent: [yes/no]
## Test Cases
- [ ] Happy path (valid request → expected response)
- [ ] Validation failures (400)
- [ ] Auth failures (401/403)
- [ ] Not found (404)
- [ ] Concurrent requests
- [ ] Rate limiting
For a UI component:
## Contract
[Component]: [props] → [rendered output + interactions]
## Test Cases
- [ ] Renders with required props
- [ ] Renders with all optional props
- [ ] User interactions trigger callbacks
- [ ] Loading/error/empty states
- [ ] Accessibility (keyboard nav, screen reader)
Output Format
When designing test strategy:
## Test Strategy for [feature/module]
### Contract
[What this code promises and requires]
### Priority Properties
[Which Desiderata properties matter most and why]
### Test Plan
1. [Test case] — [what it verifies] — [approach]
2. [Test case] — [what it verifies] — [approach]
### Tradeoffs Accepted
- [Property sacrificed] because [reason]
### Not Testing
- [What's deliberately excluded and why]
The Confidence Question
After designing the test suite, ask: "If all these tests pass, would you deploy with confidence?" If no, identify what's missing. If yes, stop — more tests beyond confidence are waste.
See Also
/debugging— Test failures trigger debugging; debugging reveals missing tests/review— Reviews assess test coverage alongside code qualityskills/FRAMEWORKS.md— Full framework indexRECIPE.md— Agent recipe for parallel decomposition (2 workers)
Overview
Design test strategy as a specification: articulate the contract, surface hidden assumptions, and surface tradeoffs using Beck's Test Desiderata. It helps plan tests for new features or refactors and review whether a codebase is well-tested.
How This Skill Works
Start by articulating the contract: what the code promises, required inputs, side effects, and invariants. Then identify boundaries (empty, one, many, boundary, error, concurrent). Next, choose a testing approach (example-based, property-based, integration, snapshot) and apply the Testing Trophy to prioritize test types (E2E, integration, unit, static).
When to Use It
- Designing tests for a new feature or refactor and need a strategy.
- Asked questions like 'how should I test this' or 'what tests do I need'.
- Reviewing or auditing an existing test strategy for coverage, reliability, and maintainability.
- Evaluating whether current tests are fast, deterministic, and well-isolated.
- Planning test work that aligns with how users actually use the software (balancing tests across layers).
Quick Start
- Step 1: Articulate the contract and invariants.
- Step 2: Identify boundaries and edge cases (empty, one, many, boundary, error, concurrent).
- Step 3: Choose testing approach and apply the Testing Trophy (Unit, Integration, End-to-End; add property-based and snapshot tests as needed).
Best Practices
- Articulate the contract: what callers can rely on, input requirements, side effects, and invariants.
- Identify boundaries: test Empty/zero/null, One, Many, Boundary, Error, and Concurrent scenarios.
- Match testing approaches: example-based, property-based, integration, and snapshot tests.
- Apply the Testing Trophy: prioritize end-to-end, integration, then unit tests; include static checks.
- Continuously evaluate and improve tests against Beck's desiderata to avoid over/under-testing.
Example Use Cases
- Design test strategy for a new API endpoint to validate contract, inputs, and side effects across unit and integration tests.
- Refactor a module and reassess boundary tests, invariants, and test writability to maintain coverage.
- Add property-based tests for a sorting algorithm to ensure invariants hold for arbitrary inputs.
- Create snapshot tests to lock UI rendering output and detect regressions early.
- Audit tests for a feature flag system to uncover race conditions and ensure correct behavior under concurrency.