Get the FREE Ultimate OpenClaw Setup Guide →

etl-pipeline-builder

npx machina-cli add skill a5c-ai/babysitter/etl-pipeline-builder --openclaw
Files (1)
SKILL.md
2.4 KB

ETL Pipeline Builder Skill

Builds and manages ETL (Extract, Transform, Load) pipelines for data migration, supporting incremental loads, CDC, and comprehensive monitoring.

Purpose

Enable data pipeline creation for:

  • Source-to-target mapping
  • Transformation definition
  • Incremental load setup
  • CDC configuration
  • Pipeline monitoring

Capabilities

1. Source-to-Target Mapping

  • Define column mappings
  • Handle schema differences
  • Configure data type conversions
  • Manage derived columns

2. Transformation Definition

  • Data type transformations
  • Value mappings
  • Aggregations
  • Lookups and enrichments

3. Incremental Load Setup

  • Define watermarks
  • Configure incremental columns
  • Handle deletes
  • Manage merge logic

4. CDC Configuration

  • Log-based CDC
  • Trigger-based CDC
  • Timestamp-based CDC
  • Full load comparison

5. Error Handling

  • Define retry policies
  • Configure dead letter queues
  • Handle data quality issues
  • Implement alerting

6. Pipeline Monitoring

  • Track pipeline metrics
  • Monitor data volumes
  • Alert on failures
  • Generate SLA reports

Tool Integrations

ToolTypeIntegration Method
Apache AirflowOrchestrationPython
dbtTransformationCLI
AirbyteData integrationAPI
FivetranSaaS ETLAPI
AWS DMSCloud migrationCLI
DebeziumCDCConfig

Output Schema

{
  "pipelineId": "string",
  "timestamp": "ISO8601",
  "pipeline": {
    "name": "string",
    "source": {},
    "target": {},
    "mappings": [],
    "transformations": [],
    "schedule": "string"
  },
  "artifacts": {
    "dagFile": "string",
    "configFile": "string",
    "sqlFiles": []
  },
  "deployment": {
    "status": "string",
    "url": "string"
  }
}

Integration with Migration Processes

  • database-schema-migration: Data movement
  • cloud-migration: Cloud data pipelines
  • data-format-migration: Format transformation

Related Skills

  • data-migration-validator: Validation
  • schema-comparator: Schema mapping

Related Agents

  • database-migration-orchestrator: Pipeline orchestration
  • data-architect-agent: Pipeline design

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/code-migration-modernization/skills/etl-pipeline-builder/SKILL.mdView on GitHub

Overview

ETL Pipeline Builder creates and manages end-to-end ETL pipelines for data migration, including incremental loads, CDC, and robust monitoring. It supports source-to-target mappings, transformation definitions, and error handling, with integrations to Airflow, dbt, Airbyte, Fivetran, AWS DMS, and Debezium.

How This Skill Works

Users define source-to-target mappings, transformation rules, and incremental load settings. The skill orchestrates pipeline components, configures CDC modes, handles errors with retries and dead-letter queues, and exposes monitoring and SLA reporting through integrated tools.

When to Use It

  • Migrating data from on-premises sources to the cloud with incremental loads
  • Needing near-real-time replication via CDC (log-based, trigger-based, or timestamp-based)
  • Applying transformations and enrichments during migration with lookups and aggregations
  • Setting up end-to-end monitoring, alerts, and SLA reports for data pipelines
  • Handling data quality issues and implementing configurable retry and dead-letter strategies

Quick Start

  1. Step 1: Define the source and target schemas and mapping rules
  2. Step 2: Add transformations, set incremental load (watermarks, incremental columns), and configure CDC
  3. Step 3: Enable monitoring, error handling (retry/DLQ) and deploy to your orchestrator (e.g., Airflow) and data integration tools

Best Practices

  • Clearly define source-target mappings and handle schema differences up front
  • Reserve watermarks and incremental columns for reliable incremental loads
  • Configure retry policies and dead-letter queues to handle transient failures
  • Choose appropriate CDC mode per source and monitor full-load vs incremental behavior
  • Test end-to-end pipelines with representative data and validate outputs against schemas

Example Use Cases

  • Migrate a customer database from on-prem Oracle to Snowflake with Airflow orchestration and dbt transformations
  • CDC-based replication from MySQL to BigQuery using Debezium and Airbyte
  • Incremental ETL for a SaaS product feeding a data lake with lookups and enrichments
  • Full-load migration with Delta checks and SLA reporting for regulatory data
  • Data-format migration converting heterogeneous sources into a unified schema with transformation rules

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers