behavior-preservation-checker
Scannednpx machina-cli add skill ArabelaTso/Skills-4-SE/behavior-preservation-checker --openclawBehavior Preservation Checker
Overview
Validate that a migrated or refactored codebase preserves the original behavior by automatically comparing runtime behavior, test results, execution traces, and observable outputs between two repository versions.
Core Workflow
1. Setup Repositories
Prepare both repositories for comparison:
# Clone or locate repositories
ORIGINAL_REPO=/path/to/original
MIGRATED_REPO=/path/to/migrated
# Ensure both are at comparable states
cd $ORIGINAL_REPO && git checkout main
cd $MIGRATED_REPO && git checkout main
2. Run Behavior Comparison
Use the comparison script to analyze behavioral differences:
python scripts/behavior_checker.py \
--original $ORIGINAL_REPO \
--migrated $MIGRATED_REPO \
--output behavior_report.json
3. Review Results
Examine the generated report for:
- Test result differences
- Execution trace divergences
- Output mismatches
- Performance regressions
- API contract violations
4. Fix Deviations
Follow actionable guidance to resolve behavioral differences.
Comparison Methods
Method 1: Test-Based Comparison
Run the same test suite on both repositories and compare results:
Workflow:
- Identify common test suite (or create equivalent tests)
- Run tests on original repository
- Run tests on migrated repository
- Compare pass/fail status, assertions, and outputs
Example:
# Run on original
cd $ORIGINAL_REPO
pytest tests/ --json-report --json-report-file=original_results.json
# Run on migrated
cd $MIGRATED_REPO
pytest tests/ --json-report --json-report-file=migrated_results.json
# Compare
python scripts/compare_test_results.py \
original_results.json \
migrated_results.json
Method 2: Execution Trace Comparison
Capture and compare execution traces:
Workflow:
- Instrument code to capture function calls, arguments, and return values
- Run identical inputs through both versions
- Compare execution traces for divergences
Example:
# Trace original
python scripts/trace_execution.py \
--repo $ORIGINAL_REPO \
--input test_inputs.json \
--output original_trace.json
# Trace migrated
python scripts/trace_execution.py \
--repo $MIGRATED_REPO \
--input test_inputs.json \
--output migrated_trace.json
# Compare traces
python scripts/compare_traces.py \
original_trace.json \
migrated_trace.json
Method 3: Observable Output Comparison
Compare program outputs for identical inputs:
Workflow:
- Define test inputs (API requests, CLI commands, function calls)
- Capture outputs from both versions (stdout, files, API responses)
- Compare outputs for differences
Example:
# Test API endpoints
python scripts/compare_api_outputs.py \
--original-url http://localhost:8000 \
--migrated-url http://localhost:8001 \
--test-cases api_test_cases.json
Method 4: Property-Based Testing
Use property-based testing to find behavioral differences:
Workflow:
- Define behavioral properties (invariants, contracts)
- Generate random inputs
- Verify properties hold for both versions
- Report any property violations
Example:
# Property: sorting should produce same result
from hypothesis import given, strategies as st
@given(st.lists(st.integers()))
def test_sort_equivalence(input_list):
original_result = original_sort(input_list)
migrated_result = migrated_sort(input_list)
assert original_result == migrated_result
Difference Detection
Test Result Differences
What to check:
- Tests that pass in original but fail in migrated
- Tests that fail in original but pass in migrated
- New test failures
- Changed assertion messages
Severity levels:
- Critical: Core functionality tests fail
- High: Integration tests fail
- Medium: Edge case tests fail
- Low: Flaky tests or timing-dependent failures
Execution Trace Differences
What to check:
- Different function call sequences
- Different argument values
- Different return values
- Missing or extra function calls
Example divergence:
Original trace:
calculate(x=10) -> 20
validate(20) -> True
save(20) -> Success
Migrated trace:
calculate(x=10) -> 21 # ā Difference!
validate(21) -> True
save(21) -> Success
Output Differences
What to check:
- Different stdout/stderr
- Different file contents
- Different API response bodies
- Different status codes
- Different error messages
Tolerance levels:
# Exact match required
assert original_output == migrated_output
# Numerical tolerance
assert abs(original_value - migrated_value) < 0.001
# Structural equivalence (ignore formatting)
assert json.loads(original) == json.loads(migrated)
Actionable Guidance
Pattern 1: Logic Error
Symptom: Different outputs for same inputs
Diagnosis:
python scripts/isolate_difference.py \
--original $ORIGINAL_REPO \
--migrated $MIGRATED_REPO \
--failing-test test_calculation
Guidance:
- Identify the diverging function
- Compare implementations side-by-side
- Check for off-by-one errors, operator changes, or logic inversions
- Add unit test for the specific case
Pattern 2: Missing Functionality
Symptom: Tests pass in original but fail in migrated with "not implemented" or "attribute error"
Diagnosis:
python scripts/find_missing_functions.py \
--original $ORIGINAL_REPO \
--migrated $MIGRATED_REPO
Guidance:
- List all missing functions/methods
- Implement missing functionality
- Verify with targeted tests
Pattern 3: API Contract Violation
Symptom: Different response structure or status codes
Diagnosis:
python scripts/compare_api_contracts.py \
--original-spec openapi_original.yaml \
--migrated-spec openapi_migrated.yaml
Guidance:
- Document API contract differences
- Update migrated API to match original contract
- Add contract tests to prevent future violations
Pattern 4: Performance Regression
Symptom: Migrated version is significantly slower
Diagnosis:
python scripts/benchmark_comparison.py \
--original $ORIGINAL_REPO \
--migrated $MIGRATED_REPO \
--iterations 100
Guidance:
- Profile both versions to identify bottlenecks
- Check for algorithmic changes (O(n) ā O(n²))
- Look for missing optimizations or caching
- Verify database query efficiency
Pattern 5: State Management Issues
Symptom: Tests fail intermittently or depend on execution order
Diagnosis:
python scripts/detect_state_issues.py \
--repo $MIGRATED_REPO \
--test-suite tests/
Guidance:
- Identify shared state between tests
- Add proper setup/teardown
- Ensure test isolation
- Check for global variable usage
Report Format
The behavior checker generates a comprehensive JSON report:
{
"summary": {
"total_tests": 150,
"passed_both": 140,
"failed_both": 2,
"passed_original_failed_migrated": 5,
"failed_original_passed_migrated": 3,
"behavioral_equivalence": "92.7%"
},
"differences": [
{
"type": "test_failure",
"test_name": "test_user_authentication",
"severity": "critical",
"original_result": "passed",
"migrated_result": "failed",
"error_message": "AssertionError: Expected 200, got 401",
"guidance": "Check authentication logic in migrated version",
"affected_files": ["auth/login.py"]
}
],
"recommendations": [
"Fix 5 critical test failures before deployment",
"Review 3 output differences for correctness"
]
}
Best Practices
- Start with tests: Ensure comprehensive test coverage before migration
- Incremental validation: Check behavior after each migration step
- Document intentional changes: Mark expected behavioral differences
- Use multiple comparison methods: Combine tests, traces, and outputs
- Automate the process: Integrate into CI/CD pipeline
- Set tolerance thresholds: Define acceptable differences (e.g., timing, formatting)
Resources
- references/comparison_techniques.md: Detailed comparison methodologies
- references/difference_patterns.md: Common behavioral difference patterns
- scripts/behavior_checker.py: Main comparison orchestrator
- scripts/compare_test_results.py: Test result comparison
- scripts/trace_execution.py: Execution trace capture
- scripts/compare_traces.py: Trace comparison and analysis
Source
git clone https://github.com/ArabelaTso/Skills-4-SE/blob/main/skills/behavior-preservation-checker/SKILL.mdView on GitHub Overview
Validate that a migrated or refactored codebase preserves original behavior by automatically comparing runtime results between two repository versions. It looks at tests, traces, and observable outputs to surface regressions and semantic changes, providing actionable guidance to restore equivalence.
How This Skill Works
Set up both repositories (original and migrated), run the analysis script to generate a behavior_report.json, and the tool will compare test results, execution traces, and observable outputs. It highlights differences and surfaces concrete guidance to fix deviations and restore behavioral equivalence.
When to Use It
- Validating code migrations to ensure behavior is preserved.
- Assessing refactorings that should be behaviorally equivalent.
- Porting code to a new language version or runtime.
- Upgrading frameworks or dependencies with backward-compat changes.
- Transformations where observable behavior must remain the same.
Quick Start
- Step 1: Define ORIGINAL_REPO and MIGRATED_REPO paths and ensure both are on comparable states.
- Step 2: Run the checker: python scripts/behavior_checker.py --original $ORIGINAL_REPO --migrated $MIGRATED_REPO --output behavior_report.json
- Step 3: Open behavior_report.json, review differences, and follow guidance to fix deviations
Best Practices
- Define deterministic inputs and fixtures to enable reproducible comparisons.
- Run comparisons in isolated environments with locked dependencies.
- Ensure test suites in both versions cover equivalent functionality.
- Compare not only test results but also execution traces and API responses.
- Automate reporting and integrate the checker into CI with deviation thresholds.
Example Use Cases
- Porting Python 2 code to Python 3 while preserving outputs.
- Migrating a microservice from Flask to FastAPI without changing behavior.
- Upgrading a data processing pipeline with API changes but identical results.
- Refactoring a core module to improve structure while keeping semantics.
- Replacing a REST API client with a vendored version without breaking contracts.