perf-theory-tester
Scannednpx machina-cli add skill ComposioHQ/awesome-claude-plugins/theory-tester --openclawperf-theory-tester
Test hypotheses using controlled experiments.
Follow docs/perf-requirements.md as the canonical contract.
Required Steps
- Confirm baseline is clean.
- Apply a single change tied to the hypothesis.
- Run 2+ validation passes.
- Revert to baseline before the next experiment.
Output Format
hypothesis: <id>
change: <summary>
delta: <metrics>
verdict: accept|reject|inconclusive
evidence:
- command: <benchmark command>
- files: <changed files>
Constraints
- One change per experiment.
- No parallel benchmarks.
- Record evidence for each run.
Source
git clone https://github.com/ComposioHQ/awesome-claude-plugins/blob/master/perf/skills/theory-tester/SKILL.mdView on GitHub Overview
perf-theory-tester helps you test performance hypotheses with controlled experiments. It follows the canonical contract in docs/perf-requirements.md and emphasizes a clean baseline, a single change, 2+ validation passes, and reverting to baseline before the next test. Output is a structured report with hypothesis, change, delta, verdict, and evidence.
How This Skill Works
Prepare a clean baseline, apply a single change tied to your hypothesis, run at least two validation passes, then revert to baseline before the next experiment. Each run records the command and changed files; the tool emits a standardized output including hypothesis id, delta metrics, and a verdict (accept, reject, or inconclusive).
When to Use It
- Validating a performance hypothesis with a controlled change
- Measuring impact of a single code or config alteration
- Ensuring reproducibility with 2+ validation passes
- Documenting evidence for performance decisions
- Isolating experiments by reverting to baseline before each test
Quick Start
- Step 1: Confirm baseline is clean.
- Step 2: Apply a single change tied to the hypothesis and run 2+ validation passes.
- Step 3: Revert to baseline before the next experiment.
Best Practices
- Start with a clean baseline before each experiment
- Apply only one change per experiment to avoid confounding effects
- Run 2+ validation passes to reduce flakiness
- Do not run benchmarks in parallel
- Record evidence for each run using the standardized output format
Example Use Cases
- Test latency impact of a cache optimization in a hot path
- Evaluate performance effect of a new indexing strategy
- Measure throughput change after a concurrency tweak
- Assess memory footprint variation after a refactor
- Validate I/O throughput improvements from an async path