MLOps
Verified@ivangdavila
npx machina-cli add skill @ivangdavila/mlops --openclawQuick Reference
| Topic | File | Key Trap |
|---|---|---|
| CI/CD and DAGs | pipelines.md | Coupling training/inference deps |
| Model serving | serving.md | Cold start with large models |
| Drift and alerts | monitoring.md | Only technical metrics |
| Versioning | reproducibility.md | Not versioning preprocessing |
| GPU infrastructure | gpu.md | GPU request = full device |
Critical Traps
Training-Serving Skew:
- Preprocessing in notebook ≠ preprocessing in service → silent bugs
- Pandas in notebook → memory leaks in production (use native types)
- Feature store values at training time ≠ serving time without proper joins
GPU Memory:
requests.nvidia.com/gpu: 1reserves ENTIRE GPU, not partial memory- MIG/MPS sharing has real limitations (not plug-and-play)
- OOM on GPU kills pod with no useful logs
Model Versioning ≠ Code Versioning:
- Model artifacts need separate versioning (MLflow, W&B, DVC)
- Training data version + preprocessing version + code version = reproducibility
- Rollback requires keeping old model versions deployable
Drift Detection Timing:
- Retraining trigger isn't just "drift > threshold" → cost/benefit matters
- Delayed ground truth makes concept drift detection lag weeks
- Upstream data pipeline changes cause drift without model issues
Scope
This skill ONLY covers:
- CI/CD pipelines for models
- Model serving and scaling
- Monitoring and drift detection
- Reproducibility practices
- GPU infrastructure patterns
Does NOT cover: ML algorithms, feature engineering, hyperparameter tuning.
Overview
MLOps is the practice of deploying ML models to production using repeatable CI/CD pipelines, scalable model serving, and continuous monitoring. It emphasizes reproducibility, drift detection, and efficient GPU infrastructure, covering versioning, serving, pipelines, and monitoring patterns.
How This Skill Works
MLOps stitches together CI/CD pipelines for training and deployment, using DAGs and containerized artifacts, with separate model versioning (MLflow, W&B, DVC). It then serves models at scale with autoscaling endpoints, while monitoring drift and alerting to trigger retraining when needed.
When to Use It
- You need repeatable pipelines from training to deployment with clear versioning
- You must serve models at scale with autoscaling and reliable endpoints
- You require drift detection and alerts to trigger retraining or rollback
- You want end-to-end reproducibility across training data, preprocessing, and code
- You're optimizing GPU infrastructure and memory usage for model workloads
Quick Start
- Step 1: Define artifact/versioning strategy (MLflow/W&B/DVC) and version data, preprocessing, and code separately
- Step 2: Build CI/CD pipelines and DAGs for training, validation, and deployment; containerize dependencies
- Step 3: Deploy a scalable model serving endpoint, enable drift monitoring and alerts, and configure GPU resource requests
Best Practices
- Version model artifacts separately from code, using MLflow, W&B, or DVC
- Version training data, preprocessing, and code to ensure end-to-end reproducibility
- Keep preprocessing consistent between training and serving to avoid training-serving skew
- Build CI/CD pipelines and DAGs with dependency isolation and clear failure modes
- Plan GPU resource requests and memory management, accounting for MIG/MPS constraints and partial memory usage
Example Use Cases
- A fintech team uses MLflow for artifact versioning, a GitOps-style CI/CD pipeline, and Kubernetes for scalable serving with drift alerts
- A retailer deploys automated retraining triggered by drift alerts rather than fixed schedules, reducing stale models
- A data science team separates training and deployment with DAG-based pipelines to avoid training-serving skew
- An AI team configures explicit GPU memory requests to prevent full GPU reservation and handles MIG constraints
- An organization maintains rollback by keeping old model versions deployable and testable in staging before production