solution-comparator
npx machina-cli add skill a5c-ai/babysitter/solution-comparator --openclawFiles (1)
SKILL.md
1.8 KB
Solution Comparator Skill
Purpose
Compare multiple algorithm solutions against the same test cases to verify correctness and benchmark performance.
Capabilities
- Run solutions against same test cases
- Performance benchmarking and comparison
- Output diff analysis
- Find minimal failing test case
- Memory usage comparison
- Time complexity validation
Target Processes
- correctness-proof-testing
- complexity-optimization
- upsolving
- algorithm-implementation
Comparison Modes
- Correctness: Compare outputs against a known-correct solution
- Performance: Benchmark execution time across solutions
- Stress Testing: Run with random large inputs to find discrepancies
- Minimal Counter-example: Binary search to find smallest failing case
Input Schema
{
"type": "object",
"properties": {
"solutions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"code": { "type": "string" },
"language": { "type": "string" }
}
}
},
"testCases": { "type": "array" },
"mode": {
"type": "string",
"enum": ["correctness", "performance", "stress", "minimal"]
},
"oracleSolution": { "type": "string" },
"timeout": { "type": "integer", "default": 5000 }
},
"required": ["solutions", "mode"]
}
Output Schema
{
"type": "object",
"properties": {
"success": { "type": "boolean" },
"results": { "type": "array" },
"discrepancies": { "type": "array" },
"performance": { "type": "object" },
"minimalFailingCase": { "type": "object" }
},
"required": ["success"]
}
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/algorithms-optimization/skills/solution-comparator/SKILL.mdView on GitHub Overview
Solution Comparator analyzes multiple algorithm implementations by running them on identical test cases, verifying correctness against a reference solution and benchmarking performance. It surfaces output differences, memory usage, and identifies minimal failing cases to accelerate debugging.
How This Skill Works
It ingests a set of solutions with code and language, executes them with the same testCases, and compares outputs to an oracle. It records timing, memory metrics, and reports discrepancies, differences in results, and mode-specific insights (correctness, performance, stress, or minimal counter-examples).
When to Use It
- You have multiple implementations of the same algorithm and want to verify correctness against a trusted oracle.
- You want to benchmark and compare runtime across solutions on identical inputs.
- You need stress testing with large random inputs to reveal discrepancies.
- You want to locate the smallest failing input (minimal counter-example) via binary search.
- You are validating memory usage and time complexity across variants during optimization.
Quick Start
- Step 1: Gather solutions with fields name, code, and language.
- Step 2: Provide testCases and an oracleSolution; choose a mode.
- Step 3: Run the comparator and inspect the results and discrepancies.
Best Practices
- Provide a single trusted oracle solution for reference outputs.
- Use identical testCases and deterministic inputs.
- Normalize I/O formatting to ensure accurate diffs.
- Capture memory and CPU usage alongside wall-clock time.
- Run isolated tests per mode (correctness, performance, stress, minimal) and iterate.
Example Use Cases
- Compare multiple sorting algorithms (quick sort, merge sort, heap sort) on the same arrays.
- Validate different shortest-path implementations against a reference on graphs.
- Benchmark different string-search algorithms on large texts.
- Stress-test dynamic programming solutions with random inputs to reveal incorrect optimizations.
- Check memory usage across recursion-heavy vs iterative variants.
Frequently Asked Questions
Add this skill to your agents