perf-benchmarker
npx machina-cli add skill ComposioHQ/awesome-claude-plugins/benchmark --openclawperf-benchmarker
Run sequential benchmarks with strict duration rules.
Follow docs/perf-requirements.md as the canonical contract.
Required Rules
- Benchmarks MUST run sequentially (never parallel).
- Minimum duration: 60s per run (30s only for binary search).
- Warmup: 10s minimum before measurement.
- Re-run anomalies.
Output Format
command: <benchmark command>
duration: <seconds>
warmup: <seconds>
results: <metrics summary>
notes: <anomalies or reruns>
Output Contract
Benchmarks MUST emit a JSON metrics block between markers:
PERF_METRICS_START
{"scenarios":{"low":{"latency_ms":120},"high":{"latency_ms":450}}}
PERF_METRICS_END
Constraints
- No short runs unless binary-search phase.
- Do not change code while benchmarking.
Source
git clone https://github.com/ComposioHQ/awesome-claude-plugins/blob/master/perf/skills/benchmark/SKILL.mdView on GitHub Overview
perf-benchmarker runs benchmarks sequentially to establish baselines and validate regressions under a canonical duration contract. It enforces minimum run durations, a warmup window, and disallows parallel benchmarks to isolate performance characteristics.
How This Skill Works
It executes a single benchmark command at a time, enforces sequential runs, and applies duration rules: minimum 60 seconds per run (30 seconds for binary search) with at least 10 seconds of warmup. Benchmarks must emit a PERF_METRICS block between PERF_METRICS_START and PERF_METRICS_END, which the tool captures and surfaces in the results output.
When to Use It
- Establishing a baseline for a new release by running sequential benchmarks over a fixed duration.
- Validating regressions after code changes by re-running benchmarks and comparing metrics.
- Measuring isolated latency or throughput with a strict warmup and minimum duration.
- Conducting binary-search style tuning with a 30-second run, when applicable.
- Following the canonical perf contract described in docs/perf-requirements.md during benchmarking.
Quick Start
- Step 1: Run the benchmark command using perf-benchmarker <command> [duration].
- Step 2: Ensure a warmup of at least 10 seconds precedes measurement and that runs are not parallel.
- Step 3: Read the PERF_METRICS_START/END block and the nested JSON metrics for results.
Best Practices
- Ensure benchmarks run strictly sequentially, never in parallel.
- Maintain the minimum duration: 60s per run, or 30s for binary search phases.
- Include a 10-second warmup before measurement.
- Re-run anomalies to confirm outlier behavior and stabilize results.
- Do not modify code while benchmarking to preserve measurements.
Example Use Cases
- Benchmark API latency on a new deployment and compare against the previous baseline using 60s+ runs.
- Evaluate a data pipeline change by running successive benchmarks to detect regressions.
- Tune a performance parameter with a binary-search style pass, keeping each run within 30s.
- Capture a metrics JSON block from PERF_METRICS_START to PERF_METRICS_END for analysis.
- Document results by reporting the command, duration, warmup, and metrics in a consistent format.