dbt
npx machina-cli add skill G1Joshi/Agent-Skills/dbt --openclawdbt (Data Build Tool)
dbt manages data transformation in the warehouse using SQL. v2.0 introduces the Fusion Engine (Rust) for performance.
When to Use
- Data Modeling: Converting raw tables into "Gold" tables.
- Testing:
not_null,uniquetests defined in YAML. - Documentation: Auto-generating data dictionaries.
Core Concepts
Models (.sql)
Select statements that dbt compiles into CREATE VIEW/TABLE.
Refs ({{ ref('users') }})
Dependency management. dbt builds the DAG automatically.
Semantic Layer
Defining metrics ("Revenue") in code so all BI tools use the same definition.
Best Practices (2025)
Do:
- Use Git: Treat data models like software code.
- Use Incremental Models: Only process new data to save cost.
- Use dbt Mesh: For cross-project dependencies in large orgs.
Don't:
- Don't put logic in BI tools: Put it in dbt.
References
Source
git clone https://github.com/G1Joshi/Agent-Skills/blob/main/skills/ai-ml/dbt/SKILL.mdView on GitHub Overview
dbt orchestrates SQL-based data transformations inside your data warehouse, turning raw tables into reliable, analysis-ready models. It supports data modeling, testing, and automatic documentation, helping data teams deliver trusted analytics. In v2.0, the Fusion Engine (Rust) boosts performance for large pipelines.
How This Skill Works
Models are SQL files that dbt compiles into CREATE VIEW or TABLE statements. Refs ({{ ref('model') }}) manage dependencies and let dbt build a DAG automatically. The semantic layer lets you define metrics in code so all BI tools share the same definitions.
When to Use It
- Data Modeling: Convert raw tables into Gold tables.
- Testing: Enforce data quality with not_null and unique tests defined in YAML.
- Documentation: Auto-generate a data dictionary for datasets.
- Dependency Management: Use Refs to build and manage the DAG automatically.
- Semantic Layer: Define metrics in code so BI tools use consistent definitions.
Quick Start
- Step 1: Install dbt and initialize a project (dbt init my_project).
- Step 2: Create .sql models and YAML tests; reference other models with {{ ref('model') }}.
- Step 3: Run dbt: dbt run; dbt test; dbt docs generate && dbt docs serve.
Best Practices
- Use Git: Treat data models like software code.
- Use Incremental Models: Only process new data to save cost.
- Use dbt Mesh: For cross-project dependencies in large orgs.
- Don't put logic in BI tools: Put it in dbt.
- Automate tests and documentation to stay in sync with code.
Example Use Cases
- Transform raw orders into a Gold orders table and incrementally update it.
- Add not_null and unique tests in YAML to validate customer data.
- Auto-generate a data dictionary for all models.
- Define a Revenue metric in the semantic layer and feed it to BI tools.
- Use {{ ref('orders') }} to model dependencies across multiple projects with dbt Mesh.