What is bisect_test.sh responsible for?

It is a test script that returns standardized exit codes so git bisect can determine if a commit is good or bad.

How should flaky tests be handled during bisect?

Include retry logic and environment isolation to distinguish flakiness from real regressions, and consider SKIP (125) if inconclusive.

What do exit codes 0, 1-124, and 125 mean?

0 means good (test passes); 1-124 indicate bad (test fails); 125 indicates skip (cannot test due to build failure or missing dependencies).

bisect-aware-instrumentation

Scanned

npx machina-cli add skill ArabelaTso/Skills-4-SE/bisect-aware-instrumentation --openclaw

Files (1)

SKILL.md

8.0 KB

Bisect-Aware Instrumentation

Overview

Instrument code to support efficient git bisect operations by producing deterministic pass/fail signals and concise runtime summaries. This skill helps create robust test scripts that work reliably with git bisect run, handling edge cases like flaky tests, build failures, and non-deterministic behavior.

Core Workflow

1. Understand the Regression

Before instrumenting, clarify:

What behavior changed? (bug introduced, performance regression, test failure)
What is the "good" commit? (known working state)
What is the "bad" commit? (known broken state)
How to reproduce the issue? (test command, manual steps)

2. Create Bisect Test Script

Generate a test script that returns proper exit codes for git bisect:

Exit Code Convention:

0: Good commit (test passes)
1-124, 126-127: Bad commit (test fails)
125: Skip commit (cannot test - build failure, missing dependencies)

Template:

#!/bin/bash
# bisect_test.sh - Test script for git bisect run

set -e  # Exit on error

# Build/setup phase
if ! make build 2>/dev/null; then
    echo "SKIP: Build failed"
    exit 125
fi

# Run test with timeout
timeout 30s ./run_test || TEST_RESULT=$?

# Interpret results
if [ $TEST_RESULT -eq 0 ]; then
    echo "GOOD: Test passed"
    exit 0
elif [ $TEST_RESULT -eq 124 ]; then
    echo "SKIP: Test timeout"
    exit 125
else
    echo "BAD: Test failed with code $TEST_RESULT"
    exit 1
fi

3. Add Determinism Safeguards

Handle non-deterministic behavior:

Retry Logic for Flaky Tests:

# Run test multiple times to confirm
PASS_COUNT=0
for i in {1..3}; do
    if ./run_test; then
        ((PASS_COUNT++))
    fi
done

if [ $PASS_COUNT -eq 3 ]; then
    echo "GOOD: All 3 runs passed"
    exit 0
elif [ $PASS_COUNT -eq 0 ]; then
    echo "BAD: All 3 runs failed"
    exit 1
else
    echo "SKIP: Flaky test ($PASS_COUNT/3 passed)"
    exit 125
fi

Environment Isolation:

# Clean state before each test
rm -rf /tmp/test_cache
export RANDOM_SEED=42
export TZ=UTC

4. Add Logging and Summaries

Generate concise output for each commit:

#!/bin/bash
COMMIT=$(git rev-parse --short HEAD)
LOG_FILE="bisect_log_${COMMIT}.txt"

echo "Testing commit: $COMMIT" | tee $LOG_FILE
echo "Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" | tee -a $LOG_FILE

# Run test and capture output
if ./run_test > test_output.txt 2>&1; then
    echo "RESULT: GOOD" | tee -a $LOG_FILE
    exit 0
else
    echo "RESULT: BAD" | tee -a $LOG_FILE
    echo "Error output:" | tee -a $LOG_FILE
    tail -20 test_output.txt | tee -a $LOG_FILE
    exit 1
fi

5. Run Git Bisect

Execute the bisect workflow:

# Start bisect
git bisect start

# Mark known good and bad commits
git bisect bad HEAD
git bisect good v1.2.0

# Run automated bisect
chmod +x bisect_test.sh
git bisect run ./bisect_test.sh

# Review results
git bisect log

Instrumentation Patterns

Pattern 1: Performance Regression Detection

#!/bin/bash
# Detect when performance drops below threshold

THRESHOLD=1000  # milliseconds

# Run benchmark
DURATION=$(./benchmark | grep "Duration:" | awk '{print $2}')

if [ -z "$DURATION" ]; then
    echo "SKIP: Benchmark failed to run"
    exit 125
fi

if [ $DURATION -lt $THRESHOLD ]; then
    echo "GOOD: Performance $DURATION ms (< $THRESHOLD ms)"
    exit 0
else
    echo "BAD: Performance $DURATION ms (>= $THRESHOLD ms)"
    exit 1
fi

Pattern 2: Test Suite Bisection

#!/bin/bash
# Find commit that broke specific test

TEST_NAME="test_user_authentication"

# Run specific test
if pytest tests/${TEST_NAME}.py -v; then
    echo "GOOD: $TEST_NAME passed"
    exit 0
else
    echo "BAD: $TEST_NAME failed"
    exit 1
fi

Pattern 3: Build Failure Detection

#!/bin/bash
# Find commit that broke the build

if make clean && make all; then
    echo "GOOD: Build succeeded"
    exit 0
else
    echo "BAD: Build failed"
    exit 1
fi

Pattern 4: Output Validation

#!/bin/bash
# Find commit that changed program output

EXPECTED_OUTPUT="Success: 42"

ACTUAL_OUTPUT=$(./program 2>&1)

if [ "$ACTUAL_OUTPUT" = "$EXPECTED_OUTPUT" ]; then
    echo "GOOD: Output matches expected"
    exit 0
else
    echo "BAD: Output mismatch"
    echo "  Expected: $EXPECTED_OUTPUT"
    echo "  Actual: $ACTUAL_OUTPUT"
    exit 1
fi

Advanced Techniques

Handling Complex Build Systems

#!/bin/bash
# Handle projects with complex dependencies

# Check if dependencies are available
if ! command -v node &> /dev/null; then
    echo "SKIP: Node.js not available in this commit"
    exit 125
fi

# Install dependencies (with caching)
if [ -f package.json ]; then
    npm ci --silent || {
        echo "SKIP: Dependency installation failed"
        exit 125
    }
fi

# Run test
npm test

Parallel Test Execution

#!/bin/bash
# Run multiple tests in parallel for faster bisection

# Run tests in parallel
parallel --halt soon,fail=1 ::: \
    "pytest tests/unit/" \
    "pytest tests/integration/" \
    "npm run lint"

if [ $? -eq 0 ]; then
    echo "GOOD: All tests passed"
    exit 0
else
    echo "BAD: At least one test failed"
    exit 1
fi

State Preservation

#!/bin/bash
# Preserve state between bisect steps

STATE_DIR=".bisect_state"
mkdir -p $STATE_DIR

# Save current commit info
git rev-parse HEAD > $STATE_DIR/current_commit

# Run test
./run_test
RESULT=$?

# Log result
echo "$(git rev-parse --short HEAD): $RESULT" >> $STATE_DIR/results.log

exit $RESULT

Troubleshooting

Issue: Bisect Marks Wrong Commit

Cause: Test script has incorrect exit codes or flaky behavior

Solution: Add verbose logging and retry logic

set -x  # Enable debug output
# Add retry logic as shown in section 3

Issue: Too Many Commits Skipped

Cause: Build failures or missing dependencies across history

Solution: Use broader skip conditions

# Skip commits with known issues
if git log -1 --format=%s | grep -q "WIP\|broken"; then
    echo "SKIP: Known broken commit"
    exit 125
fi

Issue: Bisect Takes Too Long

Cause: Slow test execution

Solution: Optimize test or use binary search hints

# Use timeout to fail fast
timeout 10s ./run_test || exit 125

# Or provide bisect hints
git bisect skip $(git rev-list --grep="refactor" HEAD~50..HEAD)

Best Practices

Make tests deterministic: Fix random seeds, timestamps, and external dependencies
Use timeouts: Prevent hanging tests from blocking bisect
Log everything: Save detailed logs for each tested commit
Handle build failures gracefully: Use exit code 125 to skip untestable commits
Test the test script: Verify it works on known good and bad commits before bisecting
Keep it fast: Optimize tests to run quickly (bisect tests O(log n) commits)

Quick Reference

Start bisect:

git bisect start
git bisect bad <bad-commit>
git bisect good <good-commit>

Run automated bisect:

git bisect run ./bisect_test.sh

Manual bisect:

git bisect good  # Current commit is good
git bisect bad   # Current commit is bad
git bisect skip  # Cannot test current commit

End bisect:

git bisect reset

Resources

references/git_bisect_guide.md: Comprehensive git bisect documentation
references/exit_codes.md: Exit code conventions and best practices
scripts/bisect_template.sh: Template bisect test script
scripts/bisect_wrapper.py: Python wrapper for complex bisect logic

Source

git clone https://github.com/ArabelaTso/Skills-4-SE/blob/main/skills/bisect-aware-instrumentation/SKILL.mdView on GitHub

Overview

Instrument code to support efficient git bisect by producing deterministic pass/fail signals and concise runtime summaries. This skill helps build robust test scripts that work with git bisect run, handle flaky tests, and provide clear logs to pinpoint the exact commit that introduced a bug.

How This Skill Works

It produces deterministic exit codes for each test run (0 for good, 1-124 and 126-127 for bad, 125 for skip), and emits concise per-commit summaries. It also adds environment isolation and optional retry logic to mitigate non-determinism, with logs that git bisect can consume during automated runs.

When to Use It

Debugging regressions with git bisect
Automating bisect workflows across CI or local scripts
Creating bisect test scripts for repeatable results
Handling flaky tests during a bisect to avoid false positives/negatives
Ensuring clear exit codes and per-commit logs for automated bisect runs

Quick Start

Step 1: Clarify the regression and define a good and a bad commit
Step 2: Create a bisect_test.sh that builds/tests and uses the 0/1-124/125 exit codes with deterministic outputs
Step 3: Run git bisect run ./bisect_test.sh and review the results and logs

Best Practices

Define a clear exit code convention: 0 good, 1-124 bad, 125 skip
Create a reusable bisect_test.sh template that encapsulates build and test steps
Add determinism safeguards such as environment isolation (seed RNG, fixed TZ) and retries for flaky tests
Generate concise commit-level logs (timestamp, commit hash, result) for auditing
Validate both good and bad commits ahead of a bisect to reduce surprises

Example Use Cases

Template bisect_test.sh that returns proper exit codes for git bisect run
Retry logic to rerun flaky tests multiple times and decide GOOD/BAD/SKIP
Environment isolation steps resetting caches and seeding randomness
Logging snippet that writes per-commit results to a log file
End-to-end bisect workflow commands using git bisect start/good/bad/run and log review

Frequently Asked Questions

Add this skill to your agents