What is the 'minimal' mode?

Minimal mode uses a binary-search-like approach to locate the smallest input that causes a solution to fail, helping pinpoint the exact edge case.

What happens when there are discrepancies?

The tool surfaces discrepancies with diffs, shows which test cases failed, and indicates which solutions diverged from the oracle.

What outputs are produced?

A success flag, a detailed results array, a list of discrepancies, performance metrics, and a minimal failing case when found.

solution-comparator

npx machina-cli add skill a5c-ai/babysitter/solution-comparator --openclaw

Files (1)

SKILL.md

1.8 KB

Solution Comparator Skill

Purpose

Compare multiple algorithm solutions against the same test cases to verify correctness and benchmark performance.

Capabilities

Run solutions against same test cases
Performance benchmarking and comparison
Output diff analysis
Find minimal failing test case
Memory usage comparison
Time complexity validation

Target Processes

correctness-proof-testing
complexity-optimization
upsolving
algorithm-implementation

Comparison Modes

Correctness: Compare outputs against a known-correct solution
Performance: Benchmark execution time across solutions
Stress Testing: Run with random large inputs to find discrepancies
Minimal Counter-example: Binary search to find smallest failing case

Input Schema

{
  "type": "object",
  "properties": {
    "solutions": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "code": { "type": "string" },
          "language": { "type": "string" }
        }
      }
    },
    "testCases": { "type": "array" },
    "mode": {
      "type": "string",
      "enum": ["correctness", "performance", "stress", "minimal"]
    },
    "oracleSolution": { "type": "string" },
    "timeout": { "type": "integer", "default": 5000 }
  },
  "required": ["solutions", "mode"]
}

Output Schema

{
  "type": "object",
  "properties": {
    "success": { "type": "boolean" },
    "results": { "type": "array" },
    "discrepancies": { "type": "array" },
    "performance": { "type": "object" },
    "minimalFailingCase": { "type": "object" }
  },
  "required": ["success"]
}

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/algorithms-optimization/skills/solution-comparator/SKILL.md

View on GitHub

Overview

Solution Comparator analyzes multiple algorithm implementations by running them on identical test cases, verifying correctness against a reference solution and benchmarking performance. It surfaces output differences, memory usage, and identifies minimal failing cases to accelerate debugging.

How This Skill Works

It ingests a set of solutions with code and language, executes them with the same testCases, and compares outputs to an oracle. It records timing, memory metrics, and reports discrepancies, differences in results, and mode-specific insights (correctness, performance, stress, or minimal counter-examples).

When to Use It

You have multiple implementations of the same algorithm and want to verify correctness against a trusted oracle.
You want to benchmark and compare runtime across solutions on identical inputs.
You need stress testing with large random inputs to reveal discrepancies.
You want to locate the smallest failing input (minimal counter-example) via binary search.
You are validating memory usage and time complexity across variants during optimization.

Quick Start

Step 1: Gather solutions with fields name, code, and language.
Step 2: Provide testCases and an oracleSolution; choose a mode.
Step 3: Run the comparator and inspect the results and discrepancies.

Best Practices

Provide a single trusted oracle solution for reference outputs.
Use identical testCases and deterministic inputs.
Normalize I/O formatting to ensure accurate diffs.
Capture memory and CPU usage alongside wall-clock time.
Run isolated tests per mode (correctness, performance, stress, minimal) and iterate.

Example Use Cases

Compare multiple sorting algorithms (quick sort, merge sort, heap sort) on the same arrays.
Validate different shortest-path implementations against a reference on graphs.
Benchmark different string-search algorithms on large texts.
Stress-test dynamic programming solutions with random inputs to reveal incorrect optimizations.
Check memory usage across recursion-heavy vs iterative variants.

Frequently Asked Questions

Add this skill to your agents