What happens if a stage fails?

The run stops immediately and reports the error; check logs, fix the issue, and re-run from the failed stage.

How do I specify a custom config file?

Pass the path as an argument to the skill; if omitted, it defaults to configs/experiment.yaml or configs/experiment.toml.

What artifacts are produced?

Processed data in data/processed, feature files in data/features, model checkpoints in checkpoints, and evaluation results in reports/experiments.

run-pipeline

npx machina-cli add skill xvirobotics/metaskill/run-pipeline --openclaw

Files (1)

SKILL.md

4.4 KB

You are executing the full data science pipeline for this project. Run each stage sequentially, verifying success before proceeding to the next stage. Stop immediately if any stage fails and report the error clearly.

Dynamic Context

Current branch: !git branch --show-current Data directory contents: !ls data/ 2>/dev/null || echo "No data/ directory found" Available configs: !ls configs/*.yaml 2>/dev/null || ls configs/*.toml 2>/dev/null || echo "No config files found" Python environment: !which python3 && python3 --version 2>/dev/null || echo "Python not found" Recent changes: !git diff --stat HEAD~3 2>/dev/null || echo "No recent commits"

Configuration

If the user provided a config file as an argument, use it: $ARGUMENTS Otherwise, look for the default config at configs/experiment.yaml or configs/experiment.toml.

Pipeline Stages

Execute each stage in order. After each stage, check for errors and verify outputs exist before proceeding.

Stage 1: Environment Check

Verify the Python environment is ready:

python3 -c "import torch; import pandas; import numpy; print(f'PyTorch {torch.__version__}, pandas {pandas.__version__}, NumPy {numpy.__version__}')"

If imports fail, report which packages are missing and suggest pip install -r requirements.txt.

Stage 2: Data Validation

Run data validation on the raw data:

python3 -m src.data.validate --data-dir data/raw/

If the validation script does not exist, look for alternative patterns:

python3 src/data/validate.py
python3 -m pytest tests/test_data/ -v --tb=short
Check for pandera schemas in src/data/ and report their status

Verify: validation passes with no critical errors. Log any warnings.

Stage 3: Preprocessing

Run the preprocessing pipeline:

python3 -m src.data.preprocess --config $CONFIG_FILE

Alternative patterns:

python3 src/data/preprocess.py --config $CONFIG_FILE
dvc repro preprocess (if DVC pipeline is configured)

Verify: processed data files exist in data/processed/ (check for .parquet or .csv files).

Stage 4: Feature Engineering

Run feature engineering:

python3 -m src.features.build_features --config $CONFIG_FILE

Alternative patterns:

python3 src/features/build_features.py
dvc repro features

Verify: feature files exist in data/features/ with expected columns.

Stage 5: Model Training

Run model training:

python3 -m src.models.training.trainer --config $CONFIG_FILE

Alternative patterns:

python3 src/models/train.py --config $CONFIG_FILE
python3 train.py --config $CONFIG_FILE

Monitor output for:

Loss values (should decrease over epochs)
Validation metrics at each epoch
Any NaN or Inf values (indicates numerical instability)
Out-of-memory errors

Verify: model checkpoint exists in checkpoints/ directory.

Stage 6: Evaluation

Run model evaluation on the test set:

python3 -m src.models.evaluation.evaluate --checkpoint checkpoints/best_model.pt --config $CONFIG_FILE

Alternative patterns:

python3 src/evaluation/evaluate.py
python3 evaluate.py --checkpoint checkpoints/best_model.pt

Verify: metrics JSON file exists in reports/ or experiments/.

Stage 7: Summary

After all stages complete, produce a summary:

Report which stages succeeded and which failed
Print the final evaluation metrics (read from the metrics JSON)
List all generated artifacts (checkpoints, processed data, feature files, metrics)
If any stage failed, provide the error message and suggest a fix
Report total pipeline execution time

Error Handling

If a stage fails, do NOT proceed to the next stage (except validation warnings which are non-blocking)
Capture stderr and stdout from each command
For Python errors, read the traceback and identify the root cause
For file-not-found errors, check if the expected directory structure exists
For import errors, report the missing package
For CUDA out-of-memory, suggest reducing batch size in the config

Source

git clone https://github.com/xvirobotics/metaskill/blob/main/examples/data-science/.claude/skills/run-pipeline/SKILL.mdView on GitHub

Overview

It runs the complete pipeline end-to-end—from validating raw data through preprocessing, feature engineering, training, and evaluation. It stops on any failure and reports clear errors, making it easy to re-run after data or code changes. Use a provided config file or the default configs to reproduce experiments.

How This Skill Works

The tool determines the config (argument or defaults), then executes each stage with Python modules (validate, preprocess, build_features, trainer, evaluate). After each stage it checks for required outputs (data/processed, data/features, checkpoints, reports) and aborts with a clear message if a stage fails.

When to Use It

You need to run the full end-to-end ML pipeline from raw data to evaluation.
You’ve updated data or code and want to re-run the entire workflow.
You want to reproduce an experiment using a specific config file.
You must validate data integrity before training to catch issues early.
You want a complete audit of artifacts (checkpoints, processed data, features, metrics) after a run.

Quick Start

Step 1: Choose a config file path as an argument, or rely on configs/experiment.yaml.
Step 2: Run the pipeline: run-pipeline [config-file].
Step 3: Validate outputs in data/processed, data/features, checkpoints, and reports/experiments.

Best Practices

Pin and version-control the config file (configs/experiment.yaml) for reproducibility.
Use a clean Python environment and install dependencies via requirements.txt.
Verify outputs after each stage before proceeding to the next.
Watch for common failures: NaN/Inf, OOM, or missing outputs; address before continuing.
Store artifacts in stable locations (data/processed, data/features, checkpoints, reports) to simplify tracking.

Example Use Cases

CI/CD pipeline that automatically validates new data, retrains, and reports metrics.
Reproducing a paper experiment by rerunning with an updated feature set.
Hyperparameter exploration by sequentially running the pipeline for different configs.
Data drift remediation by validating new data, retraining, and reevaluating.
On-demand retraining after code changes or dependency updates.

Frequently Asked Questions

Add this skill to your agents