What problems is this skill suited for?

It covers the full ML lifecycle from experimentation to production deployment, including pipelines, feature engineering, training, evaluation, and deployment.

Which hyperparameter tuning methods are recommended?

Grid search, random search, and Bayesian optimization (Optuna or Hyperopt), with early stopping to prevent overfitting.

How do you prevent data leakage and ensure evaluation integrity?

Use proper train/test splits (random, stratified, time-based, or group splits) and hold-out test sets; validate data types and monitor for leakage during feature engineering.

machine-learning

Scanned

npx machina-cli add skill aiskillstore/marketplace/machine-learning --openclaw

Files (1)

SKILL.md

6.1 KB

Machine Learning

Comprehensive machine learning skill covering the full ML lifecycle from experimentation to production deployment.

When to Use This Skill

Building machine learning pipelines
Feature engineering and data preprocessing
Model training, evaluation, and selection
Hyperparameter tuning and optimization
Model deployment and serving
ML experiment tracking and versioning
Production ML monitoring and maintenance

ML Development Lifecycle

1. Problem Definition

Classification Types:

Binary classification (spam/not spam)
Multi-class classification (image categories)
Multi-label classification (document tags)
Regression (price prediction)
Clustering (customer segmentation)
Ranking (search results)
Anomaly detection (fraud detection)

Success Metrics by Problem Type:

Problem Type	Primary Metrics	Secondary Metrics
Binary Classification	AUC-ROC, F1	Precision, Recall, PR-AUC
Multi-class	Macro F1, Accuracy	Per-class metrics
Regression	RMSE, MAE	R², MAPE
Ranking	NDCG, MAP	MRR
Clustering	Silhouette, Calinski-Harabasz	Davies-Bouldin

2. Data Preparation

Data Quality Checks:

Missing value analysis and imputation strategies
Outlier detection and handling
Data type validation
Distribution analysis
Target leakage detection

Feature Engineering Patterns:

Numerical: scaling, binning, log transforms, polynomial features
Categorical: one-hot, target encoding, frequency encoding, embeddings
Temporal: lag features, rolling statistics, cyclical encoding
Text: TF-IDF, word embeddings, transformer embeddings
Geospatial: distance features, clustering, grid encoding

Train/Test Split Strategies:

Random split (standard)
Stratified split (imbalanced classes)
Time-based split (temporal data)
Group split (prevent data leakage)
K-fold cross-validation

3. Model Selection

Algorithm Selection Guide:

Data Size	Problem	Recommended Models
Small (<10K)	Classification	Logistic Regression, SVM, Random Forest
Small (<10K)	Regression	Linear Regression, Ridge, SVR
Medium (10K-1M)	Classification	XGBoost, LightGBM, Neural Networks
Medium (10K-1M)	Regression	XGBoost, LightGBM, Neural Networks
Large (>1M)	Any	Deep Learning, Distributed training
Tabular	Any	Gradient Boosting (XGBoost, LightGBM, CatBoost)
Images	Classification	CNN, ResNet, EfficientNet, Vision Transformers
Text	NLP	Transformers (BERT, RoBERTa, GPT)
Sequential	Time Series	LSTM, Transformer, Prophet

4. Model Training

Hyperparameter Tuning:

Grid Search: exhaustive, good for small spaces
Random Search: efficient, good for large spaces
Bayesian Optimization: smart exploration (Optuna, Hyperopt)
Early stopping: prevent overfitting

Common Hyperparameters:

Model	Key Parameters
XGBoost	learning_rate, max_depth, n_estimators, subsample
LightGBM	num_leaves, learning_rate, n_estimators, feature_fraction
Random Forest	n_estimators, max_depth, min_samples_split
Neural Networks	learning_rate, batch_size, layers, dropout

5. Model Evaluation

Evaluation Best Practices:

Always use held-out test set for final evaluation
Use cross-validation during development
Check for overfitting (train vs validation gap)
Evaluate on multiple metrics
Analyze errors qualitatively

Handling Imbalanced Data:

Resampling: SMOTE, undersampling
Class weights: weighted loss functions
Threshold tuning: optimize decision threshold
Evaluation: use PR-AUC over ROC-AUC

6. Production Deployment

Model Serving Patterns:

REST API (Flask, FastAPI, TF Serving)
Batch inference (scheduled jobs)
Streaming (real-time predictions)
Edge deployment (mobile, IoT)

Production Considerations:

Latency requirements (p50, p95, p99)
Throughput (requests per second)
Model size and memory footprint
Fallback strategies
A/B testing framework

7. Monitoring & Maintenance

What to Monitor:

Prediction latency
Input feature distributions (data drift)
Prediction distributions (concept drift)
Model performance metrics
Error rates and types

Retraining Triggers:

Performance degradation below threshold
Significant data drift detected
Scheduled retraining (daily, weekly)
New training data available

MLOps Best Practices

Experiment Tracking

Track for every experiment:

Code version (git commit)
Data version (hash or version ID)
Hyperparameters
Metrics (train, validation, test)
Model artifacts
Environment (packages, versions)

Model Versioning

models/
├── model_v1.0.0/
│   ├── model.pkl
│   ├── metadata.json
│   ├── requirements.txt
│   └── metrics.json
├── model_v1.1.0/
└── model_v2.0.0/

CI/CD for ML

Continuous Integration:
- Data validation tests
- Model training tests
- Performance regression tests
Continuous Deployment:
- Staging environment validation
- Shadow mode testing
- Gradual rollout (canary)
- Automatic rollback

Reference Files

For detailed patterns and code examples, load reference files as needed:

references/preprocessing.md - Data preprocessing patterns and feature engineering techniques
references/model_patterns.md - Model architecture patterns and implementation examples
references/evaluation.md - Comprehensive evaluation strategies and metrics

Integration with Other Skills

performance - For optimizing inference latency
testing - For ML-specific testing patterns
database-optimization - For feature store queries
debugging - For model debugging and error analysis

Source

git clone https://github.com/aiskillstore/marketplace/blob/main/skills/89jobrien/machine-learning/SKILL.mdView on GitHub

Overview

Provides a structured approach to building ML pipelines, from problem definition and data preparation to model training, evaluation, and deployment. Emphasizes experiment tracking, versioning, and production monitoring to keep models reliable over time.

How This Skill Works

Practitioners follow a defined lifecycle: define problem types and success metrics; prepare data with quality checks and feature engineering; select models based on data size and problem; train with tuned hyperparameters and validate on held-out data before deploying. Deployment includes serving and ongoing monitoring to detect drift and trigger retraining when needed.

When to Use It

Building ML pipelines
Feature engineering and data preprocessing
Model training, evaluation, and selection
Hyperparameter tuning and optimization
Model deployment and serving

Quick Start

Step 1: Define the problem, data sources, and success metrics
Step 2: Prepare data, engineer features, and set appropriate train/test splits
Step 3: Train baseline models, tune hyperparameters, evaluate, and plan deployment

Best Practices

Define the problem and success metrics up front, aligned to the problem type
Implement robust data quality checks and structured feature engineering
Use appropriate train/test split strategies to avoid leakage (e.g., stratified, time-based, group)
Track experiments and version datasets and models for reproducibility
Plan for production monitoring, drift detection, and scheduled retraining

Example Use Cases

Email spam classifier using binary classification with AUC-ROC and F1
Product category image classifier (multi-class) using CNNs
Customer segmentation via clustering for targeted marketing
House price predictor (regression) with RMSE/MAE
Search ranking optimization (ranking) using NDCG

Frequently Asked Questions

Add this skill to your agents