Get the FREE Ultimate OpenClaw Setup Guide →

data-quality-frameworks

npx machina-cli add skill karim-bhalwani/agent-skills-collection/data-quality-frameworks --openclaw
Files (1)
SKILL.md
5.0 KB

Data Quality Frameworks

Expert in embedding quality validation into data pipelines as a dependency, not an afterthought.

When to Use This Skill

Use when:

  • Building comprehensive data quality validation into pipelines
  • Setting up Great Expectations suites and automated checkpoints
  • Creating dbt test suites (schema tests, relationship tests, custom tests)
  • Establishing data contracts between producer and consumer teams
  • Monitoring data quality metrics, SLAs, and anomalies
  • Debugging data quality failures or regressions
  • Implementing layer-based validation (Bronze schema, Silver business rules, Gold aggregations)
  • Blocking bad data from proceeding downstream

Core Capabilities

  1. Great Expectations - Build expectation suites, checkpoints, automated validation
  2. dbt Testing - Schema, relationship, and custom test strategies
  3. Data Contracts - Producer/consumer agreements (ODCS, datacontract-cli)
  4. Quality Monitoring - Continuous validation, metrics tracking, alerting
  5. Debugging - Root-cause analysis of data quality anomalies
  6. Layer-Based Validation - Schema (Bronze), rules (Silver), aggregation checks (Gold)

Framework References

For detailed implementation guidance, see:

Great Expectations

Use when: Building GE validation suites and checkpoints

Covers:

  • Building comprehensive expectation suites
  • Checkpoint configuration and automation
  • Running validations and handling failures
  • Integration patterns and alerting

dbt Testing

Use when: Creating dbt test suites

Covers:

  • Schema tests (unique, not_null, accepted_values, relationships)
  • Custom generic tests (reusable across models)
  • Singular tests (specific business rules)
  • Test coverage best practices

Data Contracts

Use when: Establishing producer/consumer agreements

Covers:

  • Data contract specification format
  • Schema definitions with PII classification
  • Quality expectations and SLA definitions
  • Contract versioning and evolution
  • Validation against contracts

Automated Quality Pipeline

Use when: Orchestrating end-to-end quality validation

Covers:

  • Building orchestrated quality pipelines
  • Multi-table validation workflows
  • Quality reporting and metrics
  • Integration with Airflow/orchestrators
  • Blocking pipelines on failures

Quick Decision Guide

GoalReference
Build GE validation suiteGreat Expectations
Add dbt tests to modelsdbt Testing
Define producer/consumer contractData Contracts
Orchestrate multi-table validationAutomated Quality Pipeline

Quality Strategy

Layer-Based Testing

  • Bronze (Schema): Validate schema, data types, null constraints
  • Silver (Business Rules): Test foreign keys, categorical values, ranges
  • Gold (Aggregations): Verify aggregation logic, metric calculations

Test Pyramid

  • Most tests: Single column validations (fast, focused)
  • Fewer tests: Cross-table relationships (slower, broader)
  • Blocking vs Warning: Block bad data; warn on minor issues

Best Practices

Do's:

  • ✅ Test early - Validate source data before transformations
  • ✅ Test incrementally - Add tests as you find issues
  • ✅ Document expectations - Clear descriptions for each test
  • ✅ Alert on failures - Integrate with monitoring
  • ✅ Version contracts - Track schema changes

Don'ts:

  • ❌ Don't test everything - Focus on critical columns
  • ❌ Don't ignore warnings - They often precede failures
  • ❌ Don't skip freshness - Stale data is bad data
  • ❌ Don't hardcode thresholds - Use dynamic baselines
  • ❌ Don't test in isolation - Test relationships too

Common Pitfalls & Fixes

PitfallFix
Testing only prodRun tests on dev first: dbt test --target dev
Generic thresholdsTailor tests to data characteristics
No alertingIntegrate with monitoring; block failures
Outdated expectationsReview and refresh expectations quarterly
Too many testsFocus on business-critical quality dimensions
Ignoring false positivesConfigure expectations to handle edge cases

Dependencies

  • data-pipeline-engineer - For pipeline orchestration and debugging
  • dbt-transformation-patterns - For dbt project integration
  • ops-manager - For monitoring dashboards and alerting (out of scope for this skill)

Source

git clone https://github.com/karim-bhalwani/agent-skills-collection/blob/main/skills/data-quality-frameworks/SKILL.mdView on GitHub

Overview

Specialist in embedding quality validation into data pipelines as a dependency, not an afterthought. This skill covers building and enforcing data quality gates using Great Expectations, dbt tests, and data contracts to ensure reliable, trustworthy analytics.

How This Skill Works

You design expectation suites in Great Expectations, implement dbt schema, relationship, and custom tests, and codify producer and consumer contracts. Automated checks run at defined pipeline stages (Bronze, Silver, Gold) with metrics, alerts, and failure blockages to prevent bad data from advancing.

When to Use It

  • Building comprehensive data quality validation into pipelines
  • Setting up Great Expectations suites and automated checkpoints
  • Creating dbt test suites (schema tests, relationship tests, custom tests)
  • Establishing data contracts between producer and consumer teams
  • Monitoring data quality metrics, SLAs, and anomalies

Quick Start

  1. Step 1: Define Great Expectations suites for critical pipelines
  2. Step 2: Add dbt schema/relationship tests to models
  3. Step 3: Establish data contracts and wire up automated validations

Best Practices

  • Test early: validate source data before transformations
  • Test incrementally: add tests as issues arise
  • Document expectations: provide clear descriptions for each test
  • Alert on failures: integrate with monitoring and alerts
  • Apply layer-based validation: Bronze (schema), Silver (rules), Gold (aggregations)

Example Use Cases

  • A retail data lake implementing GE suites to validate daily sales feeds
  • dbt tests covering model schemas, relationships, and custom validations
  • Data contracts between data producer teams and downstream analytics consumers
  • An automated quality pipeline orchestrated by Airflow with multi-table checks
  • Layer-based validation enforcing Bronze, Silver, and Gold checks before reporting

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers