machine-learning-lite
npx machina-cli add skill pablodiegoo/Data-Pro-Skill/machine-learning-lite --openclawMachine Learning Lite
This skill restricts the agent to using simple, highly interpretable Machine Learning methods focused on inference rather than production deployment. Deep learning or black-box predictions are strictly out of scope.
Core Capabilities
1. Interpretability & Testing
permutation_feature_importance.py: Calculates the true model importance of variables by randomly shuffling them.permutation_test_utilities.py: Rigorous statistical testing without assuming underlying data distributions.
Guidelines
- Always prefer Random Forest or Logistic Regression for feature extractability.
- If classes are highly skewed, refer to
references/imbalanced_data_strategies.md.
Source
git clone https://github.com/pablodiegoo/Data-Pro-Skill/blob/main/src/datapro/data/skills/machine-learning-lite/SKILL.mdView on GitHub Overview
This skill focuses on simple, highly interpretable machine learning methods geared toward inference rather than production deployment. It prioritizes Random Forest and Logistic Regression for clear feature extraction, and provides tools to measure feature importance and perform permutation-based statistical tests. It also offers guidance on handling imbalanced data using SMOTE when needed.
How This Skill Works
The approach relies on transparent models like Random Forest or Logistic Regression to produce interpretable feature signals. It computes feature importance with permutation_feature_importance.py by shuffling each feature and observing the impact on model performance, while permutation_test_utilities.py delivers rigorous, distribution-free statistical tests. When data are imbalanced, follow SMOTE-based strategies outlined in the references to improve inference.
When to Use It
- You need interpretable feature importance using Random Forest or Logistic Regression.
- You want to measure true model variable importance by permuting individual features.
- You require rigorous permutation-based statistical tests without assuming underlying data distributions.
- Your dataset is imbalanced and you want guidance on SMOTE or other imbalance strategies.
- You want to keep ML simple and inference-focused, avoiding deep learning or black-box models.
Quick Start
- Step 1: Choose an interpretable model (Random Forest or Logistic Regression) and prepare your dataset.
- Step 2: Run permutation_feature_importance.py to compute true feature importances.
- Step 3: Run permutation_test_utilities.py to test feature significance; apply SMOTE if data are imbalanced per guidelines.
Best Practices
- Prefer Random Forest or Logistic Regression for transparent feature extraction.
- Use permutation_feature_importance.py to quantify true variable importance.
- Run permutation_test_utilities.py for rigorous statistical tests without distribution assumptions.
- If classes are skewed, apply SMOTE or other imbalanced-data strategies as outlined in references.
- Avoid deep learning and black-box models to maintain interpretability and inference focus.
Example Use Cases
- Healthcare: Identify key predictors of readmission using Random Forest feature importance.
- Finance: Determine which features drive credit-risk predictions with permutation tests.
- Marketing: Compare feature sets and validate significance before campaign decisions.
- Fraud detection: Use simple models with SMOTE to balance classes before inference.
- A/B decision support: Present interpretable models with clear feature contributions to stakeholders.