What is perf-theory-tester?

A tool to run controlled perf experiments to validate hypotheses, following the canonical contract in docs/perf-requirements.md.

How many changes per experiment?

One change per experiment; no parallel benchmarks; record evidence for each run.

What should the output include?

A structured report with hypothesis, change, delta, verdict, and evidence including benchmark commands and changed files.

perf-theory-tester

Scanned

npx machina-cli add skill ComposioHQ/awesome-claude-plugins/theory-tester --openclaw

Files (1)

SKILL.md

731 B

perf-theory-tester

Test hypotheses using controlled experiments.

Follow docs/perf-requirements.md as the canonical contract.

Required Steps

Confirm baseline is clean.
Apply a single change tied to the hypothesis.
Run 2+ validation passes.
Revert to baseline before the next experiment.

Output Format

hypothesis: <id>
change: <summary>
delta: <metrics>
verdict: accept|reject|inconclusive
evidence:
  - command: <benchmark command>
  - files: <changed files>

Constraints

One change per experiment.
No parallel benchmarks.
Record evidence for each run.

Source

git clone https://github.com/ComposioHQ/awesome-claude-plugins/blob/master/perf/skills/theory-tester/SKILL.mdView on GitHub

Overview

perf-theory-tester helps you test performance hypotheses with controlled experiments. It follows the canonical contract in docs/perf-requirements.md and emphasizes a clean baseline, a single change, 2+ validation passes, and reverting to baseline before the next test. Output is a structured report with hypothesis, change, delta, verdict, and evidence.

How This Skill Works

Prepare a clean baseline, apply a single change tied to your hypothesis, run at least two validation passes, then revert to baseline before the next experiment. Each run records the command and changed files; the tool emits a standardized output including hypothesis id, delta metrics, and a verdict (accept, reject, or inconclusive).

When to Use It

Validating a performance hypothesis with a controlled change
Measuring impact of a single code or config alteration
Ensuring reproducibility with 2+ validation passes
Documenting evidence for performance decisions
Isolating experiments by reverting to baseline before each test

Quick Start

Step 1: Confirm baseline is clean.
Step 2: Apply a single change tied to the hypothesis and run 2+ validation passes.
Step 3: Revert to baseline before the next experiment.

Best Practices

Start with a clean baseline before each experiment
Apply only one change per experiment to avoid confounding effects
Run 2+ validation passes to reduce flakiness
Do not run benchmarks in parallel
Record evidence for each run using the standardized output format

Example Use Cases

Test latency impact of a cache optimization in a hot path
Evaluate performance effect of a new indexing strategy
Measure throughput change after a concurrency tweak
Assess memory footprint variation after a refactor
Validate I/O throughput improvements from an async path

Frequently Asked Questions

Add this skill to your agents