run-pipeline
npx machina-cli add skill xvirobotics/metaskill/run-pipeline --openclawYou are executing the full data science pipeline for this project. Run each stage sequentially, verifying success before proceeding to the next stage. Stop immediately if any stage fails and report the error clearly.
Dynamic Context
Current branch: !git branch --show-current
Data directory contents: !ls data/ 2>/dev/null || echo "No data/ directory found"
Available configs: !ls configs/*.yaml 2>/dev/null || ls configs/*.toml 2>/dev/null || echo "No config files found"
Python environment: !which python3 && python3 --version 2>/dev/null || echo "Python not found"
Recent changes: !git diff --stat HEAD~3 2>/dev/null || echo "No recent commits"
Configuration
If the user provided a config file as an argument, use it: $ARGUMENTS
Otherwise, look for the default config at configs/experiment.yaml or configs/experiment.toml.
Pipeline Stages
Execute each stage in order. After each stage, check for errors and verify outputs exist before proceeding.
Stage 1: Environment Check
Verify the Python environment is ready:
python3 -c "import torch; import pandas; import numpy; print(f'PyTorch {torch.__version__}, pandas {pandas.__version__}, NumPy {numpy.__version__}')"
If imports fail, report which packages are missing and suggest pip install -r requirements.txt.
Stage 2: Data Validation
Run data validation on the raw data:
python3 -m src.data.validate --data-dir data/raw/
If the validation script does not exist, look for alternative patterns:
python3 src/data/validate.pypython3 -m pytest tests/test_data/ -v --tb=short- Check for pandera schemas in
src/data/and report their status
Verify: validation passes with no critical errors. Log any warnings.
Stage 3: Preprocessing
Run the preprocessing pipeline:
python3 -m src.data.preprocess --config $CONFIG_FILE
Alternative patterns:
python3 src/data/preprocess.py --config $CONFIG_FILEdvc repro preprocess(if DVC pipeline is configured)
Verify: processed data files exist in data/processed/ (check for .parquet or .csv files).
Stage 4: Feature Engineering
Run feature engineering:
python3 -m src.features.build_features --config $CONFIG_FILE
Alternative patterns:
python3 src/features/build_features.pydvc repro features
Verify: feature files exist in data/features/ with expected columns.
Stage 5: Model Training
Run model training:
python3 -m src.models.training.trainer --config $CONFIG_FILE
Alternative patterns:
python3 src/models/train.py --config $CONFIG_FILEpython3 train.py --config $CONFIG_FILE
Monitor output for:
- Loss values (should decrease over epochs)
- Validation metrics at each epoch
- Any NaN or Inf values (indicates numerical instability)
- Out-of-memory errors
Verify: model checkpoint exists in checkpoints/ directory.
Stage 6: Evaluation
Run model evaluation on the test set:
python3 -m src.models.evaluation.evaluate --checkpoint checkpoints/best_model.pt --config $CONFIG_FILE
Alternative patterns:
python3 src/evaluation/evaluate.pypython3 evaluate.py --checkpoint checkpoints/best_model.pt
Verify: metrics JSON file exists in reports/ or experiments/.
Stage 7: Summary
After all stages complete, produce a summary:
- Report which stages succeeded and which failed
- Print the final evaluation metrics (read from the metrics JSON)
- List all generated artifacts (checkpoints, processed data, feature files, metrics)
- If any stage failed, provide the error message and suggest a fix
- Report total pipeline execution time
Error Handling
- If a stage fails, do NOT proceed to the next stage (except validation warnings which are non-blocking)
- Capture stderr and stdout from each command
- For Python errors, read the traceback and identify the root cause
- For file-not-found errors, check if the expected directory structure exists
- For import errors, report the missing package
- For CUDA out-of-memory, suggest reducing batch size in the config
Source
git clone https://github.com/xvirobotics/metaskill/blob/main/examples/data-science/.claude/skills/run-pipeline/SKILL.mdView on GitHub Overview
It runs the complete pipeline end-to-end—from validating raw data through preprocessing, feature engineering, training, and evaluation. It stops on any failure and reports clear errors, making it easy to re-run after data or code changes. Use a provided config file or the default configs to reproduce experiments.
How This Skill Works
The tool determines the config (argument or defaults), then executes each stage with Python modules (validate, preprocess, build_features, trainer, evaluate). After each stage it checks for required outputs (data/processed, data/features, checkpoints, reports) and aborts with a clear message if a stage fails.
When to Use It
- You need to run the full end-to-end ML pipeline from raw data to evaluation.
- You’ve updated data or code and want to re-run the entire workflow.
- You want to reproduce an experiment using a specific config file.
- You must validate data integrity before training to catch issues early.
- You want a complete audit of artifacts (checkpoints, processed data, features, metrics) after a run.
Quick Start
- Step 1: Choose a config file path as an argument, or rely on configs/experiment.yaml.
- Step 2: Run the pipeline: run-pipeline [config-file].
- Step 3: Validate outputs in data/processed, data/features, checkpoints, and reports/experiments.
Best Practices
- Pin and version-control the config file (configs/experiment.yaml) for reproducibility.
- Use a clean Python environment and install dependencies via requirements.txt.
- Verify outputs after each stage before proceeding to the next.
- Watch for common failures: NaN/Inf, OOM, or missing outputs; address before continuing.
- Store artifacts in stable locations (data/processed, data/features, checkpoints, reports) to simplify tracking.
Example Use Cases
- CI/CD pipeline that automatically validates new data, retrains, and reports metrics.
- Reproducing a paper experiment by rerunning with an updated feature set.
- Hyperparameter exploration by sequentially running the pipeline for different configs.
- Data drift remediation by validating new data, retraining, and reevaluating.
- On-demand retraining after code changes or dependency updates.