machine-learning
npx machina-cli add skill aiskillstore/marketplace/machine-learning --openclawMachine Learning
Comprehensive machine learning skill covering the full ML lifecycle from experimentation to production deployment.
When to Use This Skill
- Building machine learning pipelines
- Feature engineering and data preprocessing
- Model training, evaluation, and selection
- Hyperparameter tuning and optimization
- Model deployment and serving
- ML experiment tracking and versioning
- Production ML monitoring and maintenance
ML Development Lifecycle
1. Problem Definition
Classification Types:
- Binary classification (spam/not spam)
- Multi-class classification (image categories)
- Multi-label classification (document tags)
- Regression (price prediction)
- Clustering (customer segmentation)
- Ranking (search results)
- Anomaly detection (fraud detection)
Success Metrics by Problem Type:
| Problem Type | Primary Metrics | Secondary Metrics |
|---|---|---|
| Binary Classification | AUC-ROC, F1 | Precision, Recall, PR-AUC |
| Multi-class | Macro F1, Accuracy | Per-class metrics |
| Regression | RMSE, MAE | R², MAPE |
| Ranking | NDCG, MAP | MRR |
| Clustering | Silhouette, Calinski-Harabasz | Davies-Bouldin |
2. Data Preparation
Data Quality Checks:
- Missing value analysis and imputation strategies
- Outlier detection and handling
- Data type validation
- Distribution analysis
- Target leakage detection
Feature Engineering Patterns:
- Numerical: scaling, binning, log transforms, polynomial features
- Categorical: one-hot, target encoding, frequency encoding, embeddings
- Temporal: lag features, rolling statistics, cyclical encoding
- Text: TF-IDF, word embeddings, transformer embeddings
- Geospatial: distance features, clustering, grid encoding
Train/Test Split Strategies:
- Random split (standard)
- Stratified split (imbalanced classes)
- Time-based split (temporal data)
- Group split (prevent data leakage)
- K-fold cross-validation
3. Model Selection
Algorithm Selection Guide:
| Data Size | Problem | Recommended Models |
|---|---|---|
| Small (<10K) | Classification | Logistic Regression, SVM, Random Forest |
| Small (<10K) | Regression | Linear Regression, Ridge, SVR |
| Medium (10K-1M) | Classification | XGBoost, LightGBM, Neural Networks |
| Medium (10K-1M) | Regression | XGBoost, LightGBM, Neural Networks |
| Large (>1M) | Any | Deep Learning, Distributed training |
| Tabular | Any | Gradient Boosting (XGBoost, LightGBM, CatBoost) |
| Images | Classification | CNN, ResNet, EfficientNet, Vision Transformers |
| Text | NLP | Transformers (BERT, RoBERTa, GPT) |
| Sequential | Time Series | LSTM, Transformer, Prophet |
4. Model Training
Hyperparameter Tuning:
- Grid Search: exhaustive, good for small spaces
- Random Search: efficient, good for large spaces
- Bayesian Optimization: smart exploration (Optuna, Hyperopt)
- Early stopping: prevent overfitting
Common Hyperparameters:
| Model | Key Parameters |
|---|---|
| XGBoost | learning_rate, max_depth, n_estimators, subsample |
| LightGBM | num_leaves, learning_rate, n_estimators, feature_fraction |
| Random Forest | n_estimators, max_depth, min_samples_split |
| Neural Networks | learning_rate, batch_size, layers, dropout |
5. Model Evaluation
Evaluation Best Practices:
- Always use held-out test set for final evaluation
- Use cross-validation during development
- Check for overfitting (train vs validation gap)
- Evaluate on multiple metrics
- Analyze errors qualitatively
Handling Imbalanced Data:
- Resampling: SMOTE, undersampling
- Class weights: weighted loss functions
- Threshold tuning: optimize decision threshold
- Evaluation: use PR-AUC over ROC-AUC
6. Production Deployment
Model Serving Patterns:
- REST API (Flask, FastAPI, TF Serving)
- Batch inference (scheduled jobs)
- Streaming (real-time predictions)
- Edge deployment (mobile, IoT)
Production Considerations:
- Latency requirements (p50, p95, p99)
- Throughput (requests per second)
- Model size and memory footprint
- Fallback strategies
- A/B testing framework
7. Monitoring & Maintenance
What to Monitor:
- Prediction latency
- Input feature distributions (data drift)
- Prediction distributions (concept drift)
- Model performance metrics
- Error rates and types
Retraining Triggers:
- Performance degradation below threshold
- Significant data drift detected
- Scheduled retraining (daily, weekly)
- New training data available
MLOps Best Practices
Experiment Tracking
Track for every experiment:
- Code version (git commit)
- Data version (hash or version ID)
- Hyperparameters
- Metrics (train, validation, test)
- Model artifacts
- Environment (packages, versions)
Model Versioning
models/
├── model_v1.0.0/
│ ├── model.pkl
│ ├── metadata.json
│ ├── requirements.txt
│ └── metrics.json
├── model_v1.1.0/
└── model_v2.0.0/
CI/CD for ML
-
Continuous Integration:
- Data validation tests
- Model training tests
- Performance regression tests
-
Continuous Deployment:
- Staging environment validation
- Shadow mode testing
- Gradual rollout (canary)
- Automatic rollback
Reference Files
For detailed patterns and code examples, load reference files as needed:
references/preprocessing.md- Data preprocessing patterns and feature engineering techniquesreferences/model_patterns.md- Model architecture patterns and implementation examplesreferences/evaluation.md- Comprehensive evaluation strategies and metrics
Integration with Other Skills
- performance - For optimizing inference latency
- testing - For ML-specific testing patterns
- database-optimization - For feature store queries
- debugging - For model debugging and error analysis
Source
git clone https://github.com/aiskillstore/marketplace/blob/main/skills/89jobrien/machine-learning/SKILL.mdView on GitHub Overview
Provides a structured approach to building ML pipelines, from problem definition and data preparation to model training, evaluation, and deployment. Emphasizes experiment tracking, versioning, and production monitoring to keep models reliable over time.
How This Skill Works
Practitioners follow a defined lifecycle: define problem types and success metrics; prepare data with quality checks and feature engineering; select models based on data size and problem; train with tuned hyperparameters and validate on held-out data before deploying. Deployment includes serving and ongoing monitoring to detect drift and trigger retraining when needed.
When to Use It
- Building ML pipelines
- Feature engineering and data preprocessing
- Model training, evaluation, and selection
- Hyperparameter tuning and optimization
- Model deployment and serving
Quick Start
- Step 1: Define the problem, data sources, and success metrics
- Step 2: Prepare data, engineer features, and set appropriate train/test splits
- Step 3: Train baseline models, tune hyperparameters, evaluate, and plan deployment
Best Practices
- Define the problem and success metrics up front, aligned to the problem type
- Implement robust data quality checks and structured feature engineering
- Use appropriate train/test split strategies to avoid leakage (e.g., stratified, time-based, group)
- Track experiments and version datasets and models for reproducibility
- Plan for production monitoring, drift detection, and scheduled retraining
Example Use Cases
- Email spam classifier using binary classification with AUC-ROC and F1
- Product category image classifier (multi-class) using CNNs
- Customer segmentation via clustering for targeted marketing
- House price predictor (regression) with RMSE/MAE
- Search ranking optimization (ranking) using NDCG