Get the FREE Ultimate OpenClaw Setup Guide →

testing

npx machina-cli add skill tslateman/duet/testing --openclaw
Files (1)
SKILL.md
7.4 KB

Test Design as Thinking

Overview

Test strategy, not test generation. Treat test design as an act of specification — articulate the contract, find the boundaries, surface hidden assumptions. Use Beck's Test Desiderata to make testing tradeoffs deliberate instead of accidental.

Beck's 12 Test Desiderata

Every test balances these properties. No test maximizes all twelve. The skill is knowing which to prioritize.

PropertyDefinitionTension
IsolatedSame results regardless of run ordervs. speed (shared setup)
ComposableTest dimensions of variability separatelyvs. writability (more tests)
DeterministicSame results if nothing changesvs. realism (real services)
FastRun quicklyvs. predictiveness (integration)
WritableCheap to write relative to code costvs. thoroughness
ReadableComprehensible, invokes motivationvs. conciseness
BehavioralSensitive to behavior changesvs. structure-insensitivity
Structure-insensitiveUnaffected by refactoringvs. behavioral sensitivity
AutomatedNo human intervention neededvs. exploratory testing
SpecificFailure cause is obviousvs. breadth of coverage
PredictivePassing means production-readyvs. speed and isolation
InspiringPassing builds confidence to deployvs. all other properties

See references/desiderata.md for application guidance.

Test Design Workflow

1. Articulate the Contract

Before writing any test, answer:

  • What does this code promise to callers?
  • What does it require from its inputs?
  • What side effects does it produce?
  • What invariants must always hold?

If you can't answer these, the code's contract is unclear. Fix that first.

2. Identify Boundaries

Every contract has edges. Test them:

  • Empty/zero/null — the degenerate case
  • One — the simplest non-empty case
  • Many — the normal case
  • Boundary — max values, off-by-one, type limits
  • Error — invalid input, unavailable dependencies
  • Concurrent — multiple callers, race conditions

3. Choose the Testing Approach

Match the approach to what you're testing:

Example-based tests — specific inputs and expected outputs. Best for known contracts with clear boundaries.

Property-based tests — invariants that hold for all inputs. Best for algorithms, parsers, serialization (encode/decode roundtrip), and sorting.

Integration tests — multiple components together. Best for verifying wiring, data flow, and contracts between modules.

Snapshot tests — output matches recorded baseline. Best for rendering, serialization, and configuration.

4. Apply the Testing Trophy

Kent C. Dodds' priority order:

         ┌──────┐
         │  E2E │  Few, slow, high confidence
        ┌┴──────┴┐
        │Integra-│  Most tests here
        │  tion  │
       ┌┴────────┴┐
       │   Unit   │  Many, fast, specific
      ┌┴──────────┴┐
      │   Static   │  Types, linters, formatters
      └────────────┘

"The more your tests resemble the way your software is used, the more confidence they can give you."

5. Evaluate Existing Tests

Ask of each test:

  • Which Desiderata properties does it maximize?
  • Which did it sacrifice? Was that deliberate?
  • Does it test behavior or implementation detail?
  • If this test fails, will the message tell you why?
  • If the implementation changes but behavior doesn't, does this test break? (It shouldn't)

Test Smells

SmellSymptomFix
Testing implementationBreaks on refactor, behavior unchangedTest outputs, not internals
Tautological testRepeats production logic in assertionsTest observable behavior
Happy path onlyNo error/boundary casesAdd boundary analysis
FlakyPasses sometimes, fails sometimesFix nondeterminism or mark explicitly
Giant arrange30 lines of setup for 1 assertionSimplify the interface or use builders
Invisible assertionexpect(result).toBeTruthy()Assert specific values
Test per methodOne test per function, misses integrationTest use cases, not methods

Strategy Templates

For a pure function:

## Contract

[function name]: [input types] → [output type]

- Promises: [what it guarantees]
- Requires: [what inputs must satisfy]

## Test Cases

- [ ] Empty/zero input
- [ ] Single valid input
- [ ] Multiple valid inputs
- [ ] Boundary values
- [ ] Invalid inputs (error cases)
- [ ] Properties that hold for all inputs

For an API endpoint:

## Contract

[METHOD /path]: [request] → [response]

- Auth: [required/optional/none]
- Idempotent: [yes/no]

## Test Cases

- [ ] Happy path (valid request → expected response)
- [ ] Validation failures (400)
- [ ] Auth failures (401/403)
- [ ] Not found (404)
- [ ] Concurrent requests
- [ ] Rate limiting

For a UI component:

## Contract

[Component]: [props] → [rendered output + interactions]

## Test Cases

- [ ] Renders with required props
- [ ] Renders with all optional props
- [ ] User interactions trigger callbacks
- [ ] Loading/error/empty states
- [ ] Accessibility (keyboard nav, screen reader)

Output Format

When designing test strategy:

## Test Strategy for [feature/module]

### Contract

[What this code promises and requires]

### Priority Properties

[Which Desiderata properties matter most and why]

### Test Plan

1. [Test case] — [what it verifies] — [approach]
2. [Test case] — [what it verifies] — [approach]

### Tradeoffs Accepted

- [Property sacrificed] because [reason]

### Not Testing

- [What's deliberately excluded and why]

The Confidence Question

After designing the test suite, ask: "If all these tests pass, would you deploy with confidence?" If no, identify what's missing. If yes, stop — more tests beyond confidence are waste.

See Also

  • /debugging — Test failures trigger debugging; debugging reveals missing tests
  • /review — Reviews assess test coverage alongside code quality
  • skills/FRAMEWORKS.md — Full framework index
  • RECIPE.md — Agent recipe for parallel decomposition (2 workers)

Source

git clone https://github.com/tslateman/duet/blob/main/skills/testing/SKILL.mdView on GitHub

Overview

Design test strategy as a specification: articulate the contract, surface hidden assumptions, and surface tradeoffs using Beck's Test Desiderata. It helps plan tests for new features or refactors and review whether a codebase is well-tested.

How This Skill Works

Start by articulating the contract: what the code promises, required inputs, side effects, and invariants. Then identify boundaries (empty, one, many, boundary, error, concurrent). Next, choose a testing approach (example-based, property-based, integration, snapshot) and apply the Testing Trophy to prioritize test types (E2E, integration, unit, static).

When to Use It

  • Designing tests for a new feature or refactor and need a strategy.
  • Asked questions like 'how should I test this' or 'what tests do I need'.
  • Reviewing or auditing an existing test strategy for coverage, reliability, and maintainability.
  • Evaluating whether current tests are fast, deterministic, and well-isolated.
  • Planning test work that aligns with how users actually use the software (balancing tests across layers).

Quick Start

  1. Step 1: Articulate the contract and invariants.
  2. Step 2: Identify boundaries and edge cases (empty, one, many, boundary, error, concurrent).
  3. Step 3: Choose testing approach and apply the Testing Trophy (Unit, Integration, End-to-End; add property-based and snapshot tests as needed).

Best Practices

  • Articulate the contract: what callers can rely on, input requirements, side effects, and invariants.
  • Identify boundaries: test Empty/zero/null, One, Many, Boundary, Error, and Concurrent scenarios.
  • Match testing approaches: example-based, property-based, integration, and snapshot tests.
  • Apply the Testing Trophy: prioritize end-to-end, integration, then unit tests; include static checks.
  • Continuously evaluate and improve tests against Beck's desiderata to avoid over/under-testing.

Example Use Cases

  • Design test strategy for a new API endpoint to validate contract, inputs, and side effects across unit and integration tests.
  • Refactor a module and reassess boundary tests, invariants, and test writability to maintain coverage.
  • Add property-based tests for a sorting algorithm to ensure invariants hold for arbitrary inputs.
  • Create snapshot tests to lock UI rendering output and detect regressions early.
  • Audit tests for a feature flag system to uncover race conditions and ensure correct behavior under concurrency.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers