Get the FREE Ultimate OpenClaw Setup Guide →

dbt-transformation-patterns

Scanned
npx machina-cli add skill karim-bhalwani/agent-skills-collection/dbt-transformation-patterns --openclaw
Files (1)
SKILL.md
5.9 KB
  • sql
  • analytics
  • transformations

dbt Transformation Patterns

Overview

The dbt Transformation Patterns skill focuses on building production-grade analytics transformations with dbt—from project structure through testing and documentation. This is a specialized, deep-focus skill for analytics engineers and data engineers implementing SQL transformations.

Use this skill when:

  • Building or restructuring a dbt project
  • Implementing incremental models for large datasets
  • Setting up testing and documentation
  • Optimizing dbt model organization and materialization
  • Creating reusable macros and transformations

Core Capabilities

  • Project Architecture: Organize models into staging, intermediate, and marts layers
  • Incremental Strategies: Choose delete+insert, merge, or insert-overwrite based on warehouse and data patterns
  • Testing Framework: Design comprehensive test suites (schema, data quality, relationships)
  • Documentation: Generate and maintain dbt docs, column-level lineage, model descriptions
  • Macro Development: Build DRY, reusable SQL macros for common transformations
  • Performance Tuning: Optimize dbt DAG, materialization strategy, and warehouse execution

When to Use

When to Use

  • Building a new dbt project or restructuring an existing one
  • Implementing incremental models for large fact/event tables
  • Setting up automated testing and data quality validation
  • Optimizing dbt performance (compile time, execution time, DAG structure)
  • Creating reusable macros for transformations
  • Establishing documentation and lineage standards

Workflow / Process

Phase 1: Project Structure

  1. Define source system definitions and mappings
  2. Design layer strategy (staging → intermediate → marts)
  3. Establish naming conventions and materialization strategy

Phase 2: Model Development

  1. Build staging models (1:1 with sources, light cleaning)
  2. Create intermediate models (business logic, joins)
  3. Build mart models (final dimension and fact tables)

Phase 3: Testing & Documentation

  1. Add column and model tests
  2. Define data quality expectations
  3. Generate and validate dbt docs

Phase 4: Optimization

  1. Tune materialization (view vs table vs ephemeral vs incremental)
  2. Optimize incremental strategies for full-refresh vs run speed
  3. Monitor dbt run time and warehouse costs

Standards & Best Practices

Project Organization

  • Staging: 1:1 with sources, minimal transformations, documented column lineage
  • Intermediate: Business logic, joins, aggregations; ephemeral unless reused
  • Marts: Final analytics tables (dimensions and facts), fully tested
  • Naming: stg_ (staging), int_ (intermediate), dim_/fct_ (marts)

Testing Strategy

  • Schema Tests: Column-level (not null, unique) on all key columns
  • Relationship Tests: Foreign key integrity to upstream models
  • Custom Tests: Business rule validation (amounts >= 0, statuses in enum)
  • Test Coverage: Goal is 100% on dimensions and facts; 80%+ on staging

Incremental Models

  • Use for: Tables > 1M rows where full refresh becomes expensive
  • Delete+Insert: Default; good for append-only data
  • Merge: Best for tables with updates to existing rows
  • Insert Overwrite: Partition-based; efficient for time-series data

Common Pitfalls

  • Raw → Mart in One Model: Creates tech debt. Always stage data first.
  • Hardcoded Dates: Use {{ var() }} for parameters, not hardcoded filters.
  • Duplicate Logic: Extract repeated SQL to macros, not copy-paste.
  • No Testing: Prevents bugs from propagating downstream. Test aggressively.
  • Ignoring Source Freshness: Data staleness breaks analytics. Monitor it.
  • Over-Materialization: Materialize as table only when necessary (marts, heavy joins).
  • Not Using Incremental: Full refresh becomes expensive; incremental catches late arrivals.

Constraints

Technical Constraints:

  • Cannot modify source system definitions without architect approval
  • All models must pass testing before deployment
  • Incremental strategies must handle late-arriving and updated data correctly

Scope Constraints:

  • In Scope: Model development, SQL transformations, dbt macros, testing, documentation
  • Out of Scope: Data pipeline orchestration (use data-pipeline-engineer), source system setup (use architect)

Integration Points

PhaseInput FromOutput ToContext
Requirementsarchitect, data-pipeline-engineerModel designUnderstanding data domain and staging inputs
Source Definitionsdata-pipeline-engineerStaging modelsRaw data from pipelines becomes transformed
Quality Validationdata-quality-frameworksdbt testsData quality expectations embedded in tests
DocumentationModel metadataAnalytics tools (Tableau, Looker)dbt docs provide lineage and column descriptions
OptimizationPerformance issuessenior-data-engineerComplex tuning or architectural redesign

Reference Examples

See examples/ directory for:

  • Complete project structure (staging/intermediate/marts)
  • Source definitions and freshness monitoring
  • Incremental model implementations
  • Testing suites (schema, relationships, custom)
  • Macro examples (DRY transformations)
  • dbt commands reference

Version History:

  • 1.0 (2026-01-24): dbt-focused analytics engineering skill

Source

git clone https://github.com/karim-bhalwani/agent-skills-collection/blob/main/skills/dbt-transformation-patterns/SKILL.mdView on GitHub

Overview

Specializes in building production-grade dbt analytics transformations, covering project structure, incremental strategies, testing, documentation, and lineage. Helps analytics engineers organize models, improve performance, and maintain data quality across dbt projects.

How This Skill Works

Work proceeds in phases: define a clear project structure (staging, intermediate, marts) with consistent naming and materialization; develop models from staging to marts with business logic in the intermediate layer; then add tests and generate documentation to establish lineage and data quality. Reuse DRY macros to standardize transformations and tune materialization for performance and cost.

When to Use It

  • Building a new dbt project or restructuring an existing one
  • Implementing incremental models for large fact/event tables
  • Setting up automated testing and data quality validation
  • Optimizing dbt performance (compile time, execution time, DAG structure)
  • Creating reusable macros and transformations

Quick Start

  1. Step 1: Define source definitions and establish a 3-layer project structure (staging, intermediate, marts)
  2. Step 2: Build staging models 1:1 with sources, then add intermediate models with core business logic
  3. Step 3: Create mart models, add tests, enable docs generation, and monitor run time and costs

Best Practices

  • Project Organization: structure models into Staging (1:1 with sources), Intermediate (business logic), and Marts (dim/fact tables) with clear lineage
  • Naming and Materialization: use consistent prefixes (stg_, int_, dim_, fct_) and choose appropriate materializations per model
  • Testing Strategy: implement schema, relationship, and custom tests; target high coverage on dimensions/facts
  • Incremental Models: apply delete+insert for append-only data, merge for updates, insert_overwrite for time-series
  • Avoid pitfalls and DRY patterns: don't raw-load from source to mart in one model; build reusable macros for common transformations and use variables for dynamic values

Example Use Cases

  • Phase a dbt project with staging, intermediate, and marts layers and documented lineage
  • Implement an incremental model using delete+insert for a large fact table
  • Add column tests and relationship tests, then generate and validate dbt docs
  • Create a reusable macro for date normalization and join logic across multiple models
  • Tune materialization choice and DAG structure to reduce compute time and warehouse cost

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers