Get the FREE Ultimate OpenClaw Setup Guide →

machine-learning

npx machina-cli add skill aiskillstore/marketplace/machine-learning --openclaw
Files (1)
SKILL.md
6.1 KB

Machine Learning

Comprehensive machine learning skill covering the full ML lifecycle from experimentation to production deployment.

When to Use This Skill

  • Building machine learning pipelines
  • Feature engineering and data preprocessing
  • Model training, evaluation, and selection
  • Hyperparameter tuning and optimization
  • Model deployment and serving
  • ML experiment tracking and versioning
  • Production ML monitoring and maintenance

ML Development Lifecycle

1. Problem Definition

Classification Types:

  • Binary classification (spam/not spam)
  • Multi-class classification (image categories)
  • Multi-label classification (document tags)
  • Regression (price prediction)
  • Clustering (customer segmentation)
  • Ranking (search results)
  • Anomaly detection (fraud detection)

Success Metrics by Problem Type:

Problem TypePrimary MetricsSecondary Metrics
Binary ClassificationAUC-ROC, F1Precision, Recall, PR-AUC
Multi-classMacro F1, AccuracyPer-class metrics
RegressionRMSE, MAER², MAPE
RankingNDCG, MAPMRR
ClusteringSilhouette, Calinski-HarabaszDavies-Bouldin

2. Data Preparation

Data Quality Checks:

  • Missing value analysis and imputation strategies
  • Outlier detection and handling
  • Data type validation
  • Distribution analysis
  • Target leakage detection

Feature Engineering Patterns:

  • Numerical: scaling, binning, log transforms, polynomial features
  • Categorical: one-hot, target encoding, frequency encoding, embeddings
  • Temporal: lag features, rolling statistics, cyclical encoding
  • Text: TF-IDF, word embeddings, transformer embeddings
  • Geospatial: distance features, clustering, grid encoding

Train/Test Split Strategies:

  • Random split (standard)
  • Stratified split (imbalanced classes)
  • Time-based split (temporal data)
  • Group split (prevent data leakage)
  • K-fold cross-validation

3. Model Selection

Algorithm Selection Guide:

Data SizeProblemRecommended Models
Small (<10K)ClassificationLogistic Regression, SVM, Random Forest
Small (<10K)RegressionLinear Regression, Ridge, SVR
Medium (10K-1M)ClassificationXGBoost, LightGBM, Neural Networks
Medium (10K-1M)RegressionXGBoost, LightGBM, Neural Networks
Large (>1M)AnyDeep Learning, Distributed training
TabularAnyGradient Boosting (XGBoost, LightGBM, CatBoost)
ImagesClassificationCNN, ResNet, EfficientNet, Vision Transformers
TextNLPTransformers (BERT, RoBERTa, GPT)
SequentialTime SeriesLSTM, Transformer, Prophet

4. Model Training

Hyperparameter Tuning:

  • Grid Search: exhaustive, good for small spaces
  • Random Search: efficient, good for large spaces
  • Bayesian Optimization: smart exploration (Optuna, Hyperopt)
  • Early stopping: prevent overfitting

Common Hyperparameters:

ModelKey Parameters
XGBoostlearning_rate, max_depth, n_estimators, subsample
LightGBMnum_leaves, learning_rate, n_estimators, feature_fraction
Random Forestn_estimators, max_depth, min_samples_split
Neural Networkslearning_rate, batch_size, layers, dropout

5. Model Evaluation

Evaluation Best Practices:

  • Always use held-out test set for final evaluation
  • Use cross-validation during development
  • Check for overfitting (train vs validation gap)
  • Evaluate on multiple metrics
  • Analyze errors qualitatively

Handling Imbalanced Data:

  • Resampling: SMOTE, undersampling
  • Class weights: weighted loss functions
  • Threshold tuning: optimize decision threshold
  • Evaluation: use PR-AUC over ROC-AUC

6. Production Deployment

Model Serving Patterns:

  • REST API (Flask, FastAPI, TF Serving)
  • Batch inference (scheduled jobs)
  • Streaming (real-time predictions)
  • Edge deployment (mobile, IoT)

Production Considerations:

  • Latency requirements (p50, p95, p99)
  • Throughput (requests per second)
  • Model size and memory footprint
  • Fallback strategies
  • A/B testing framework

7. Monitoring & Maintenance

What to Monitor:

  • Prediction latency
  • Input feature distributions (data drift)
  • Prediction distributions (concept drift)
  • Model performance metrics
  • Error rates and types

Retraining Triggers:

  • Performance degradation below threshold
  • Significant data drift detected
  • Scheduled retraining (daily, weekly)
  • New training data available

MLOps Best Practices

Experiment Tracking

Track for every experiment:

  • Code version (git commit)
  • Data version (hash or version ID)
  • Hyperparameters
  • Metrics (train, validation, test)
  • Model artifacts
  • Environment (packages, versions)

Model Versioning

models/
├── model_v1.0.0/
│   ├── model.pkl
│   ├── metadata.json
│   ├── requirements.txt
│   └── metrics.json
├── model_v1.1.0/
└── model_v2.0.0/

CI/CD for ML

  1. Continuous Integration:

    • Data validation tests
    • Model training tests
    • Performance regression tests
  2. Continuous Deployment:

    • Staging environment validation
    • Shadow mode testing
    • Gradual rollout (canary)
    • Automatic rollback

Reference Files

For detailed patterns and code examples, load reference files as needed:

  • references/preprocessing.md - Data preprocessing patterns and feature engineering techniques
  • references/model_patterns.md - Model architecture patterns and implementation examples
  • references/evaluation.md - Comprehensive evaluation strategies and metrics

Integration with Other Skills

  • performance - For optimizing inference latency
  • testing - For ML-specific testing patterns
  • database-optimization - For feature store queries
  • debugging - For model debugging and error analysis

Source

git clone https://github.com/aiskillstore/marketplace/blob/main/skills/89jobrien/machine-learning/SKILL.mdView on GitHub

Overview

Provides a structured approach to building ML pipelines, from problem definition and data preparation to model training, evaluation, and deployment. Emphasizes experiment tracking, versioning, and production monitoring to keep models reliable over time.

How This Skill Works

Practitioners follow a defined lifecycle: define problem types and success metrics; prepare data with quality checks and feature engineering; select models based on data size and problem; train with tuned hyperparameters and validate on held-out data before deploying. Deployment includes serving and ongoing monitoring to detect drift and trigger retraining when needed.

When to Use It

  • Building ML pipelines
  • Feature engineering and data preprocessing
  • Model training, evaluation, and selection
  • Hyperparameter tuning and optimization
  • Model deployment and serving

Quick Start

  1. Step 1: Define the problem, data sources, and success metrics
  2. Step 2: Prepare data, engineer features, and set appropriate train/test splits
  3. Step 3: Train baseline models, tune hyperparameters, evaluate, and plan deployment

Best Practices

  • Define the problem and success metrics up front, aligned to the problem type
  • Implement robust data quality checks and structured feature engineering
  • Use appropriate train/test split strategies to avoid leakage (e.g., stratified, time-based, group)
  • Track experiments and version datasets and models for reproducibility
  • Plan for production monitoring, drift detection, and scheduled retraining

Example Use Cases

  • Email spam classifier using binary classification with AUC-ROC and F1
  • Product category image classifier (multi-class) using CNNs
  • Customer segmentation via clustering for targeted marketing
  • House price predictor (regression) with RMSE/MAE
  • Search ranking optimization (ranking) using NDCG

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers