What is Airflow best used for?

Airflow orchestrates data pipelines, including ETL/ELT workflows, ML ops, and managing task dependencies for reliable scheduling.

What’s new in Airflow v3.0?

Airflow v3.0 introduces Event-driven Triggers and a modern React UI, expanding reactive workflows and UX.

What are DAGs and the Task SDK?

DAGs are Python-defined workflows; the Task SDK lets tasks be written in languages beyond Python, enabling broader task implementations and even edge execution.

airflow

npx machina-cli add skill G1Joshi/Agent-Skills/airflow --openclaw

Files (1)

SKILL.md

1.0 KB

Airflow

Apache Airflow is the standard for data engineering pipelines. v3.0 (2025) introduces Event-driven Triggers and a modern React UI.

When to Use

ETL/ELT: Scheduling nightly data warehouse loads.
ML Ops: Retraining models when new data arrives.
Dependency Management: "Run Task B only if Task A succeeds".

Core Concepts

DAGs (Directed Acyclic Graphs)

Defined in Python.

Task SDK

New in v3.0. Allows writing tasks in any language, not just Python.

Edge Executor

Run tasks on remote edge devices.

Best Practices (2025)

Do:

Use the TaskFlow API: @task decorators are cleaner than PythonOperator.
Use Datasets: Define data-aware scheduling (schedule=[Dataset("s3://bucket/file")]).

Don't:

Don't put top-level code in DAG files: It runs every scheduler heartbeat.

References

Airflow Documentation

Source

git clone https://github.com/G1Joshi/Agent-Skills/blob/main/skills/ai-ml/airflow/SKILL.mdView on GitHub

Overview

Airflow is the standard for data engineering pipelines. It orchestrates ETL/ELT workflows, ML model retraining, and task dependencies, delivering reliable scheduling and orchestration. Airflow v3.0 introduces Event-driven Triggers and a modern React UI.

How This Skill Works

Pipelines are defined as DAGs written in Python. The Task SDK in v3.0 lets you write tasks in languages other than Python, expanding what you can run in a workflow. The Edge Executor enables executing tasks on remote edge devices, broadening where tasks can run.

When to Use It

ETL/ELT: Scheduling nightly data warehouse loads.
ML Ops: Retraining models when new data arrives.
Dependency management: Run Task B only if Task A succeeds.
Edge computing: Execute tasks on remote edge devices with the Edge Executor.
Event-driven workflows: Trigger pipelines in response to data events.

Quick Start

Step 1: Install Airflow and initialize the metadata database.
Step 2: Create a DAG using the TaskFlow API with @task-decorated functions.
Step 3: Configure a Dataset or an event trigger to enable data-aware or event-driven scheduling.

Best Practices

Use the TaskFlow API: @task decorators provide cleaner DAGs than traditional operators.
Use Datasets: Define data-aware scheduling to react to data presence.
Don't put top-level code in DAG files: It runs on every scheduler heartbeat.
Explore the Task SDK to write tasks in languages other than Python.
Leverage Event-driven Triggers in v3.0 to start jobs when data events occur.

Example Use Cases

Nightly ETL for a data warehouse to refresh dashboards.
ML model retraining triggered by the arrival of new training data.
Coordinate Task B to run only after Task A completes successfully.
Running analytics tasks on edge devices using the Edge Executor.
Event-driven pipelines that kick off when new data lands in a data lake.

Frequently Asked Questions

Add this skill to your agents