data-migration-validator
npx machina-cli add skill a5c-ai/babysitter/data-migration-validator --openclawData Migration Validator Skill
Validates data integrity throughout the migration process with comprehensive verification checks and reconciliation reporting.
Purpose
Enable data validation for:
- Row count validation
- Checksum verification
- Sample data comparison
- Referential integrity checking
- Business rule validation
Capabilities
1. Row Count Validation
- Compare source/target counts
- Track by table/partition
- Identify discrepancies
- Generate count reports
2. Checksum Verification
- Calculate table checksums
- Compare hash values
- Identify data drift
- Verify data consistency
3. Sample Data Comparison
- Random sample selection
- Field-by-field comparison
- Statistical sampling
- Confidence scoring
4. Referential Integrity Checking
- Verify foreign keys
- Check orphaned records
- Validate relationships
- Report violations
5. Business Rule Validation
- Apply custom rules
- Check data constraints
- Verify transformations
- Validate calculations
6. Reconciliation Reporting
- Generate audit reports
- Track discrepancies
- Document exceptions
- Provide sign-off reports
Tool Integrations
| Tool | Purpose | Integration Method |
|---|---|---|
| Great Expectations | Data validation | Library |
| dbt tests | Transform validation | CLI |
| Custom SQL | Database checks | CLI |
| DataGrip | Manual verification | GUI |
| Apache Griffin | Data quality | API |
Output Schema
{
"validationId": "string",
"timestamp": "ISO8601",
"results": {
"rowCounts": {
"tables": [
{
"name": "string",
"source": "number",
"target": "number",
"match": "boolean"
}
]
},
"checksums": {
"tables": [],
"overall": "string"
},
"samples": {
"checked": "number",
"matched": "number",
"discrepancies": []
},
"referentialIntegrity": {
"valid": "boolean",
"violations": []
},
"businessRules": {
"passed": "number",
"failed": "number",
"failures": []
}
},
"summary": {
"status": "passed|failed|warning",
"score": "number"
}
}
Integration with Migration Processes
- database-schema-migration: Post-migration validation
- cloud-migration: Data validation
Related Skills
schema-comparator: Pre-migration comparisonetl-pipeline-builder: Migration execution
Related Agents
data-integrity-validator: Orchestrates validationdatabase-migration-orchestrator: Uses for verification
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/code-migration-modernization/skills/data-migration-validator/SKILL.mdView on GitHub Overview
Data Migration Validator ensures data integrity throughout the migration by performing row-count validation, checksum verification, sample data comparisons, referential integrity checks, and business rule validation. It produces reconciliation reports and an auditable output schema to document discrepancies and sign-off readiness.
How This Skill Works
The tool compares source and target data per table, calculates row counts and table checksums, performs random or statistical sampling for field-by-field comparisons, validates foreign keys and relationships, and applies custom business rules. Results are compiled into a standardized reconciliation output schema for audit and sign-off.
When to Use It
- Before migration to establish baseline row counts, checksums, and validation rules per table.
- During migration to detect drift and discrepancies in near real-time.
- After migration to certify parity between source and target data across tables and partitions.
- When validating complex transformations and business rules to ensure calculated fields and constraints are correct.
- For regulatory audits and sign-off requiring a comprehensive reconciliation report.
Quick Start
- Step 1: Enumerate target tables and establish baseline row counts, checksums, and business rules.
- Step 2: Run the data-migration-validator workflow to compute row counts, checksums, sample data, referential integrity, and rule validation; generate the output schema.
- Step 3: Review the reconciliation report, investigate any discrepancies, and obtain sign-off.
Best Practices
- Define per-table baselines (row counts and checksums) before starting migration.
- Use random or statistically significant sampling for data comparisons and track sample size.
- Run checksum verification with consistent hashing and seeding to detect drift reliably.
- Validate referential integrity (foreign keys and orphan checks) during the migration window.
- Automate reconciliation reporting and establish a formal sign-off workflow.
Example Use Cases
- E-commerce order data migration: compare per-table row counts and run checksum verification; sample orders to verify field values and transformations.
- Product catalog migration: validate referential integrity between products and categories; ensure no orphaned reference IDs.
- Customer data transformation: apply business rules on derived fields (e.g., discount eligibility) and verify results match expectations.
- Cloud-based data warehouse migration: leverage Great Expectations and dbt tests for end-to-end validation with automated reports.
- ETL pipeline migration: perform regression checks with historical checksums to detect data drift after pipeline changes.