What is the dbt-transformation-patterns skill?

A specialization for building production-grade analytics transformations using dbt, focusing on project structure, incremental strategies, testing, documentation, and lineage management.

How does it handle incremental models?

It recommends selecting between delete+insert, merge, or insert_overwrite based on the data change pattern and the warehouse capabilities to balance accuracy and performance.

How are documentation and lineage managed?

By generating dbt docs, capturing column-level lineage, and maintaining model descriptions to ensure traceability from sources to marts.

dbt-transformation-patterns

Scanned

npx machina-cli add skill karim-bhalwani/agent-skills-collection/dbt-transformation-patterns --openclaw

Files (1)

SKILL.md

5.9 KB

sql
analytics
transformations

dbt Transformation Patterns

Overview

The dbt Transformation Patterns skill focuses on building production-grade analytics transformations with dbt—from project structure through testing and documentation. This is a specialized, deep-focus skill for analytics engineers and data engineers implementing SQL transformations.

Use this skill when:

Building or restructuring a dbt project
Implementing incremental models for large datasets
Setting up testing and documentation
Optimizing dbt model organization and materialization
Creating reusable macros and transformations

Core Capabilities

Project Architecture: Organize models into staging, intermediate, and marts layers
Incremental Strategies: Choose delete+insert, merge, or insert-overwrite based on warehouse and data patterns
Testing Framework: Design comprehensive test suites (schema, data quality, relationships)
Documentation: Generate and maintain dbt docs, column-level lineage, model descriptions
Macro Development: Build DRY, reusable SQL macros for common transformations
Performance Tuning: Optimize dbt DAG, materialization strategy, and warehouse execution

When to Use

Building a new dbt project or restructuring an existing one
Implementing incremental models for large fact/event tables
Setting up automated testing and data quality validation
Optimizing dbt performance (compile time, execution time, DAG structure)
Creating reusable macros for transformations
Establishing documentation and lineage standards

Workflow / Process

Phase 1: Project Structure

Define source system definitions and mappings
Design layer strategy (staging → intermediate → marts)
Establish naming conventions and materialization strategy

Phase 2: Model Development

Build staging models (1:1 with sources, light cleaning)
Create intermediate models (business logic, joins)
Build mart models (final dimension and fact tables)

Phase 3: Testing & Documentation

Add column and model tests
Define data quality expectations
Generate and validate dbt docs

Phase 4: Optimization

Tune materialization (view vs table vs ephemeral vs incremental)
Optimize incremental strategies for full-refresh vs run speed
Monitor dbt run time and warehouse costs

Standards & Best Practices

Project Organization

Staging: 1:1 with sources, minimal transformations, documented column lineage
Intermediate: Business logic, joins, aggregations; ephemeral unless reused
Marts: Final analytics tables (dimensions and facts), fully tested
Naming: stg_ (staging), int_ (intermediate), dim_/fct_ (marts)

Testing Strategy

Schema Tests: Column-level (not null, unique) on all key columns
Relationship Tests: Foreign key integrity to upstream models
Custom Tests: Business rule validation (amounts >= 0, statuses in enum)
Test Coverage: Goal is 100% on dimensions and facts; 80%+ on staging

Incremental Models

Use for: Tables > 1M rows where full refresh becomes expensive
Delete+Insert: Default; good for append-only data
Merge: Best for tables with updates to existing rows
Insert Overwrite: Partition-based; efficient for time-series data

Common Pitfalls

Raw → Mart in One Model: Creates tech debt. Always stage data first.
Hardcoded Dates: Use {{ var() }} for parameters, not hardcoded filters.
Duplicate Logic: Extract repeated SQL to macros, not copy-paste.
No Testing: Prevents bugs from propagating downstream. Test aggressively.
Ignoring Source Freshness: Data staleness breaks analytics. Monitor it.
Over-Materialization: Materialize as table only when necessary (marts, heavy joins).
Not Using Incremental: Full refresh becomes expensive; incremental catches late arrivals.

Constraints

Technical Constraints:

Cannot modify source system definitions without architect approval
All models must pass testing before deployment
Incremental strategies must handle late-arriving and updated data correctly

Scope Constraints:

In Scope: Model development, SQL transformations, dbt macros, testing, documentation
Out of Scope: Data pipeline orchestration (use data-pipeline-engineer), source system setup (use architect)

Integration Points

Phase	Input From	Output To	Context
Requirements	`architect`, `data-pipeline-engineer`	Model design	Understanding data domain and staging inputs
Source Definitions	`data-pipeline-engineer`	Staging models	Raw data from pipelines becomes transformed
Quality Validation	`data-quality-frameworks`	dbt tests	Data quality expectations embedded in tests
Documentation	Model metadata	Analytics tools (Tableau, Looker)	dbt docs provide lineage and column descriptions
Optimization	Performance issues	`senior-data-engineer`	Complex tuning or architectural redesign

Reference Examples

See examples/ directory for:

Complete project structure (staging/intermediate/marts)
Source definitions and freshness monitoring
Incremental model implementations
Testing suites (schema, relationships, custom)
Macro examples (DRY transformations)
dbt commands reference

Version History:

1.0 (2026-01-24): dbt-focused analytics engineering skill

Source

git clone https://github.com/karim-bhalwani/agent-skills-collection/blob/main/skills/dbt-transformation-patterns/SKILL.md

View on GitHub

Overview

Specializes in building production-grade dbt analytics transformations, covering project structure, incremental strategies, testing, documentation, and lineage. Helps analytics engineers organize models, improve performance, and maintain data quality across dbt projects.

How This Skill Works

Work proceeds in phases: define a clear project structure (staging, intermediate, marts) with consistent naming and materialization; develop models from staging to marts with business logic in the intermediate layer; then add tests and generate documentation to establish lineage and data quality. Reuse DRY macros to standardize transformations and tune materialization for performance and cost.

When to Use It

Building a new dbt project or restructuring an existing one
Implementing incremental models for large fact/event tables
Setting up automated testing and data quality validation
Optimizing dbt performance (compile time, execution time, DAG structure)
Creating reusable macros and transformations

Quick Start

Step 1: Define source definitions and establish a 3-layer project structure (staging, intermediate, marts)
Step 2: Build staging models 1:1 with sources, then add intermediate models with core business logic
Step 3: Create mart models, add tests, enable docs generation, and monitor run time and costs

Best Practices

Project Organization: structure models into Staging (1:1 with sources), Intermediate (business logic), and Marts (dim/fact tables) with clear lineage
Naming and Materialization: use consistent prefixes (stg_, int_, dim_, fct_) and choose appropriate materializations per model
Testing Strategy: implement schema, relationship, and custom tests; target high coverage on dimensions/facts
Incremental Models: apply delete+insert for append-only data, merge for updates, insert_overwrite for time-series
Avoid pitfalls and DRY patterns: don't raw-load from source to mart in one model; build reusable macros for common transformations and use variables for dynamic values

Example Use Cases

Phase a dbt project with staging, intermediate, and marts layers and documented lineage
Implement an incremental model using delete+insert for a large fact table
Add column tests and relationship tests, then generate and validate dbt docs
Create a reusable macro for date normalization and join logic across multiple models
Tune materialization choice and DAG structure to reduce compute time and warehouse cost

Frequently Asked Questions

Add this skill to your agents