What is machine-learning-lite?

A skill that focuses on interpretable machine learning methods for inference, emphasizing feature importance and statistical testing over production-grade or deep learning models.

Which methods are central to this skill?

Random Forest and Logistic Regression for feature extraction, permutation_feature_importance.py for importance, permutation_test_utilities.py for tests, and SMOTE guidance for imbalance.

When should I consult SMOTE guidelines?

When your classes are highly skewed; follow the references in references/imbalanced_data_strategies.md for appropriate imbalance handling.

machine-learning-lite

npx machina-cli add skill pablodiegoo/Data-Pro-Skill/machine-learning-lite --openclaw

Files (1)

SKILL.md

951 B

Machine Learning Lite

This skill restricts the agent to using simple, highly interpretable Machine Learning methods focused on inference rather than production deployment. Deep learning or black-box predictions are strictly out of scope.

Core Capabilities

1. Interpretability & Testing

permutation_feature_importance.py: Calculates the true model importance of variables by randomly shuffling them.
permutation_test_utilities.py: Rigorous statistical testing without assuming underlying data distributions.

Guidelines

Always prefer Random Forest or Logistic Regression for feature extractability.
If classes are highly skewed, refer to references/imbalanced_data_strategies.md.

Source

git clone https://github.com/pablodiegoo/Data-Pro-Skill/blob/main/src/datapro/data/skills/machine-learning-lite/SKILL.mdView on GitHub

Overview

This skill focuses on simple, highly interpretable machine learning methods geared toward inference rather than production deployment. It prioritizes Random Forest and Logistic Regression for clear feature extraction, and provides tools to measure feature importance and perform permutation-based statistical tests. It also offers guidance on handling imbalanced data using SMOTE when needed.

How This Skill Works

The approach relies on transparent models like Random Forest or Logistic Regression to produce interpretable feature signals. It computes feature importance with permutation_feature_importance.py by shuffling each feature and observing the impact on model performance, while permutation_test_utilities.py delivers rigorous, distribution-free statistical tests. When data are imbalanced, follow SMOTE-based strategies outlined in the references to improve inference.

When to Use It

You need interpretable feature importance using Random Forest or Logistic Regression.
You want to measure true model variable importance by permuting individual features.
You require rigorous permutation-based statistical tests without assuming underlying data distributions.
Your dataset is imbalanced and you want guidance on SMOTE or other imbalance strategies.
You want to keep ML simple and inference-focused, avoiding deep learning or black-box models.

Quick Start

Step 1: Choose an interpretable model (Random Forest or Logistic Regression) and prepare your dataset.
Step 2: Run permutation_feature_importance.py to compute true feature importances.
Step 3: Run permutation_test_utilities.py to test feature significance; apply SMOTE if data are imbalanced per guidelines.

Best Practices

Prefer Random Forest or Logistic Regression for transparent feature extraction.
Use permutation_feature_importance.py to quantify true variable importance.
Run permutation_test_utilities.py for rigorous statistical tests without distribution assumptions.
If classes are skewed, apply SMOTE or other imbalanced-data strategies as outlined in references.
Avoid deep learning and black-box models to maintain interpretability and inference focus.

Example Use Cases

Healthcare: Identify key predictors of readmission using Random Forest feature importance.
Finance: Determine which features drive credit-risk predictions with permutation tests.
Marketing: Compare feature sets and validate significance before campaign decisions.
Fraud detection: Use simple models with SMOTE to balance classes before inference.
A/B decision support: Present interpretable models with clear feature contributions to stakeholders.

Frequently Asked Questions

Add this skill to your agents