ml-experiment-tracker
npx machina-cli add skill 0x-Professor/Agent-Skills-Hub/ml-experiment-tracker --openclawML Experiment Tracker
Overview
Generate structured experiment plans that can be logged consistently in experiment tracking systems.
Workflow
- Define dataset, target task, model family, and parameter search space.
- Define metrics and acceptance thresholds before training.
- Produce run plan with version and artifact expectations.
- Export the run plan for execution in tracking tools.
Use Bundled Resources
- Run
scripts/build_experiment_plan.pyto generate consistent run plans. - Read
references/tracking-guide.mdfor reproducibility checklist.
Guardrails
- Keep inputs explicit and machine-readable.
- Always include metrics and baseline criteria.
Source
git clone https://github.com/0x-Professor/Agent-Skills-Hub/blob/main/skills/ml-experiment-tracker/SKILL.mdView on GitHub Overview
ML Experiment Tracker generates structured, tracking-ready plans that specify dataset, task, model family, parameter space, metrics, and artifacts. These plans are designed to be logged consistently in experiment-tracking systems, enabling reproducibility and auditability before training begins.
How This Skill Works
Define dataset, target task, model family, and parameter search space; then specify metrics and acceptance thresholds before training. Run scripts/build_experiment_plan.py to generate a versioned run plan with artifact expectations and baselines, then export it to your tracking tool. Consult references/tracking-guide.md for the reproducibility checklist.
When to Use It
- Before starting a training run to standardize tracking-ready definitions
- When comparing model families with explicit parameter search spaces
- To lock metrics, thresholds, and baselines prior to training
- When exporting run plans to tracking tools for team collaboration
- For reproducibility audits referencing versioned experiment plans
Quick Start
- Step 1: Define dataset, target task, model family, and parameter search space
- Step 2: Define metrics and acceptance thresholds; set version and artifact expectations
- Step 3: Run scripts/build_experiment_plan.py to generate and export the plan to your tracking tool
Best Practices
- Keep inputs explicit and machine-readable in the run plan
- Define dataset, target task, and model family up front with clear parameter search space
- Include metrics, acceptance thresholds, and baseline criteria in every plan
- Version your run plans and artifact expectations to preserve auditability
- Use the bundled script (scripts/build_experiment_plan.py) to generate consistent plans
Example Use Cases
- Churn prediction: plan includes dataset, binary target, logistic regression family, grid search over C and class_weight, with AUC and F1 thresholds
- Image classification: CNN family with learning rate and batch size sweeps, metrics like accuracy and top-5 accuracy, and artifact expectations for checkpoints
- Time-series forecasting: plan for ARIMA/Prophet family with horizon and features, MAE/MAPE thresholds, and forecast artifacts
- NLP sentiment: transformer-based model family with tokenization options, metrics like accuracy and F1, and baseline criteria
- Fraud detection: versioned run plan exporting to a tracking tool with feature stores and model artifacts for audit trails