dbt-transformation-patterns
Scannednpx machina-cli add skill karim-bhalwani/agent-skills-collection/dbt-transformation-patterns --openclaw- sql
- analytics
- transformations
dbt Transformation Patterns
Overview
The dbt Transformation Patterns skill focuses on building production-grade analytics transformations with dbt—from project structure through testing and documentation. This is a specialized, deep-focus skill for analytics engineers and data engineers implementing SQL transformations.
Use this skill when:
- Building or restructuring a dbt project
- Implementing incremental models for large datasets
- Setting up testing and documentation
- Optimizing dbt model organization and materialization
- Creating reusable macros and transformations
Core Capabilities
- Project Architecture: Organize models into staging, intermediate, and marts layers
- Incremental Strategies: Choose delete+insert, merge, or insert-overwrite based on warehouse and data patterns
- Testing Framework: Design comprehensive test suites (schema, data quality, relationships)
- Documentation: Generate and maintain dbt docs, column-level lineage, model descriptions
- Macro Development: Build DRY, reusable SQL macros for common transformations
- Performance Tuning: Optimize dbt DAG, materialization strategy, and warehouse execution
When to Use
When to Use
- Building a new dbt project or restructuring an existing one
- Implementing incremental models for large fact/event tables
- Setting up automated testing and data quality validation
- Optimizing dbt performance (compile time, execution time, DAG structure)
- Creating reusable macros for transformations
- Establishing documentation and lineage standards
Workflow / Process
Phase 1: Project Structure
- Define source system definitions and mappings
- Design layer strategy (staging → intermediate → marts)
- Establish naming conventions and materialization strategy
Phase 2: Model Development
- Build staging models (1:1 with sources, light cleaning)
- Create intermediate models (business logic, joins)
- Build mart models (final dimension and fact tables)
Phase 3: Testing & Documentation
- Add column and model tests
- Define data quality expectations
- Generate and validate dbt docs
Phase 4: Optimization
- Tune materialization (view vs table vs ephemeral vs incremental)
- Optimize incremental strategies for full-refresh vs run speed
- Monitor dbt run time and warehouse costs
Standards & Best Practices
Project Organization
- Staging: 1:1 with sources, minimal transformations, documented column lineage
- Intermediate: Business logic, joins, aggregations; ephemeral unless reused
- Marts: Final analytics tables (dimensions and facts), fully tested
- Naming:
stg_(staging),int_(intermediate),dim_/fct_(marts)
Testing Strategy
- Schema Tests: Column-level (not null, unique) on all key columns
- Relationship Tests: Foreign key integrity to upstream models
- Custom Tests: Business rule validation (amounts >= 0, statuses in enum)
- Test Coverage: Goal is 100% on dimensions and facts; 80%+ on staging
Incremental Models
- Use for: Tables > 1M rows where full refresh becomes expensive
- Delete+Insert: Default; good for append-only data
- Merge: Best for tables with updates to existing rows
- Insert Overwrite: Partition-based; efficient for time-series data
Common Pitfalls
- Raw → Mart in One Model: Creates tech debt. Always stage data first.
- Hardcoded Dates: Use
{{ var() }}for parameters, not hardcoded filters. - Duplicate Logic: Extract repeated SQL to macros, not copy-paste.
- No Testing: Prevents bugs from propagating downstream. Test aggressively.
- Ignoring Source Freshness: Data staleness breaks analytics. Monitor it.
- Over-Materialization: Materialize as table only when necessary (marts, heavy joins).
- Not Using Incremental: Full refresh becomes expensive; incremental catches late arrivals.
Constraints
Technical Constraints:
- Cannot modify source system definitions without architect approval
- All models must pass testing before deployment
- Incremental strategies must handle late-arriving and updated data correctly
Scope Constraints:
- In Scope: Model development, SQL transformations, dbt macros, testing, documentation
- Out of Scope: Data pipeline orchestration (use data-pipeline-engineer), source system setup (use architect)
Integration Points
| Phase | Input From | Output To | Context |
|---|---|---|---|
| Requirements | architect, data-pipeline-engineer | Model design | Understanding data domain and staging inputs |
| Source Definitions | data-pipeline-engineer | Staging models | Raw data from pipelines becomes transformed |
| Quality Validation | data-quality-frameworks | dbt tests | Data quality expectations embedded in tests |
| Documentation | Model metadata | Analytics tools (Tableau, Looker) | dbt docs provide lineage and column descriptions |
| Optimization | Performance issues | senior-data-engineer | Complex tuning or architectural redesign |
Reference Examples
See examples/ directory for:
- Complete project structure (staging/intermediate/marts)
- Source definitions and freshness monitoring
- Incremental model implementations
- Testing suites (schema, relationships, custom)
- Macro examples (DRY transformations)
- dbt commands reference
Version History:
- 1.0 (2026-01-24): dbt-focused analytics engineering skill
Source
git clone https://github.com/karim-bhalwani/agent-skills-collection/blob/main/skills/dbt-transformation-patterns/SKILL.mdView on GitHub Overview
Specializes in building production-grade dbt analytics transformations, covering project structure, incremental strategies, testing, documentation, and lineage. Helps analytics engineers organize models, improve performance, and maintain data quality across dbt projects.
How This Skill Works
Work proceeds in phases: define a clear project structure (staging, intermediate, marts) with consistent naming and materialization; develop models from staging to marts with business logic in the intermediate layer; then add tests and generate documentation to establish lineage and data quality. Reuse DRY macros to standardize transformations and tune materialization for performance and cost.
When to Use It
- Building a new dbt project or restructuring an existing one
- Implementing incremental models for large fact/event tables
- Setting up automated testing and data quality validation
- Optimizing dbt performance (compile time, execution time, DAG structure)
- Creating reusable macros and transformations
Quick Start
- Step 1: Define source definitions and establish a 3-layer project structure (staging, intermediate, marts)
- Step 2: Build staging models 1:1 with sources, then add intermediate models with core business logic
- Step 3: Create mart models, add tests, enable docs generation, and monitor run time and costs
Best Practices
- Project Organization: structure models into Staging (1:1 with sources), Intermediate (business logic), and Marts (dim/fact tables) with clear lineage
- Naming and Materialization: use consistent prefixes (stg_, int_, dim_, fct_) and choose appropriate materializations per model
- Testing Strategy: implement schema, relationship, and custom tests; target high coverage on dimensions/facts
- Incremental Models: apply delete+insert for append-only data, merge for updates, insert_overwrite for time-series
- Avoid pitfalls and DRY patterns: don't raw-load from source to mart in one model; build reusable macros for common transformations and use variables for dynamic values
Example Use Cases
- Phase a dbt project with staging, intermediate, and marts layers and documented lineage
- Implement an incremental model using delete+insert for a large fact table
- Add column tests and relationship tests, then generate and validate dbt docs
- Create a reusable macro for date normalization and join logic across multiple models
- Tune materialization choice and DAG structure to reduce compute time and warehouse cost