Get the FREE Ultimate OpenClaw Setup Guide →

generate-report

npx machina-cli add skill xvirobotics/metaskill/generate-report --openclaw
Files (1)
SKILL.md
6.2 KB

You are generating a comprehensive experiment report for this data science project. Your goal is to gather all available metrics, plots, and configuration details from the latest experiment and produce a clear, well-structured report that can be shared with the team.

Dynamic Context

Current branch: !git branch --show-current Git commit: !git rev-parse --short HEAD 2>/dev/null || echo "unknown" Recent experiment logs: !ls -lt reports/*.json experiments/*.json 2>/dev/null | head -5 || echo "No experiment logs found" Available plots: !ls reports/figures/*.png reports/figures/*.svg 2>/dev/null | head -10 || echo "No plots found" Checkpoints: !ls -lt checkpoints/*.pt checkpoints/*.pth 2>/dev/null | head -3 || echo "No checkpoints" Config used: !ls configs/*.yaml configs/*.toml 2>/dev/null | head -3 || echo "No configs"

Experiment Name

If the user provided an experiment name: $ARGUMENTS Otherwise, derive one from the branch name, latest config file, or use the current date.

Report Generation Process

Step 1: Gather Experiment Data

Collect all available information about the latest experiment:

  1. Metrics: Read the latest metrics JSON from reports/ or experiments/
  2. Training logs: Look for training output logs, MLflow run data, or W&B run summaries
  3. Configuration: Read the experiment config file (YAML/TOML)
  4. Checkpoint metadata: Load the best checkpoint and extract epoch, metric, config
  5. Dataset statistics: Look for data profiling outputs or read from data validation logs
# Find and read latest metrics
METRICS_FILE=$(ls -t reports/*.json experiments/*.json 2>/dev/null | head -1)
if [ -n "$METRICS_FILE" ]; then
    echo "=== Latest Metrics ==="
    cat "$METRICS_FILE"
fi

# Find config used
CONFIG_FILE=$(ls -t configs/*.yaml configs/*.toml 2>/dev/null | head -1)
if [ -n "$CONFIG_FILE" ]; then
    echo "=== Configuration ==="
    cat "$CONFIG_FILE"
fi

Step 2: Gather Baseline Data

Look for baseline metrics to compare against:

  1. Check for a reports/baseline_metrics.json or experiments/baseline.json
  2. Check git history for previous metrics files: git log --oneline --all -- reports/*.json
  3. If MLflow is configured, query for the baseline run
  4. If no baseline exists, note this in the report

Step 3: Generate Visualizations

If plots do not already exist, generate them:

python3 -c "
import json
from pathlib import Path

# Check if visualization script exists
viz_script = Path('src/evaluation/visualize.py')
if viz_script.exists():
    print('Visualization script found')
else:
    print('No visualization script found -- will generate basic plots')
"

Key visualizations to include:

  • Training curves: loss and metric over epochs (train vs. validation)
  • Confusion matrix: if classification task
  • Metric comparison bar chart: current vs. baseline
  • Feature importance: if available from the model or analysis

Step 4: Write the Report

Generate the report as a Markdown file at reports/experiment_report.md:

# Experiment Report: [Experiment Name]

**Date:** [current date]
**Branch:** [git branch]
**Commit:** [git commit hash]
**Author:** [generated by /generate-report skill]

---

## Executive Summary

[2-3 sentences: what was the experiment, what was the key result, and is it better than baseline?]

## Experiment Configuration

| Parameter | Value |
|-----------|-------|
| Model architecture | [from config] |
| Learning rate | [from config] |
| Batch size | [from config] |
| Epochs | [from config] |
| Optimizer | [from config] |
| Scheduler | [from config] |
| Random seed | [from config] |
| Dataset version | [from config or DVC] |

## Dataset Summary

| Split | Samples | Features | Classes |
|-------|---------|----------|---------|
| Train | [count] | [count] | [count or N/A] |
| Validation | [count] | [count] | [count or N/A] |
| Test | [count] | [count] | [count or N/A] |

## Results

### Final Metrics

| Metric | Value |
|--------|-------|
| [metric 1] | [value] |
| [metric 2] | [value] |
| ... | ... |

### Comparison with Baseline

| Metric | Baseline | Current | Delta | Improvement? |
|--------|----------|---------|-------|-------------|
| [metric 1] | [value] | [value] | [+/- value] | [Yes/No] |
| ... | ... | ... | ... | ... |

### Training Curves

![Training Loss](figures/training_loss.png)
![Validation Metric](figures/validation_metric.png)

### Confusion Matrix

![Confusion Matrix](figures/confusion_matrix.png)

## Analysis

### Key Findings
- [Finding 1: most important result]
- [Finding 2: notable pattern or observation]
- [Finding 3: any concerning behavior]

### Error Analysis
- [What types of errors does the model make?]
- [Are errors concentrated in specific classes or data subsets?]

### Comparison with Previous Experiments
- [How does this compare to previous runs?]
- [What changed and what impact did it have?]

## Recommendations

### Next Steps
1. [Actionable recommendation 1]
2. [Actionable recommendation 2]
3. [Actionable recommendation 3]

### Potential Improvements
- [Idea for model improvement]
- [Idea for data improvement]
- [Idea for training procedure improvement]

## Artifacts

| Artifact | Path |
|----------|------|
| Best checkpoint | checkpoints/best_model.pt |
| Metrics JSON | reports/metrics.json |
| Config file | configs/experiment.yaml |
| Training logs | experiments/[run-id]/ |
| Figures | reports/figures/ |

---

*Report generated automatically by the /generate-report skill.*

Step 5: Verify Report Quality

After writing the report:

  1. Read it back and verify all placeholders are filled with actual data
  2. Verify all referenced figure paths exist
  3. Verify metrics values are reasonable (not NaN, not obviously wrong)
  4. Ensure the executive summary accurately reflects the detailed results
  5. Check that recommendations are specific and actionable, not generic

Report the path to the generated report file when complete.

Source

git clone https://github.com/xvirobotics/metaskill/blob/main/examples/data-science/.claude/skills/generate-report/SKILL.mdView on GitHub

Overview

This skill compiles a comprehensive summary of the latest experiment, collecting metrics, plots, and configuration details. It compares results against the baseline and outputs a Markdown report that can be shared with the team. Use it after training and evaluation to communicate results clearly.

How This Skill Works

It scans reports and experiments folders for the most recent run, extracts metrics, training logs, and config files, and loads the best checkpoint metadata. It then generates a Markdown report at reports/experiment_report.md, embedding key plots and a baseline comparison, ready for sharing.

When to Use It

  • After finishing a training and evaluation run to summarize results for the team.
  • When you need to compare current performance to the baseline.
  • Before sharing results with stakeholders or reviewers.
  • When multiple experiments exist and you want a consolidated report.
  • To produce a repeatable report format for CI or ML ops.

Quick Start

  1. Step 1: Provide the experiment name as an argument, e.g. 'generate-report transformer-v2-lr-sweep'.
  2. Step 2: The tool collects metrics, config, and plots from reports/experiments/ and builds reports/experiment_report.md.
  3. Step 3: Open reports/experiment_report.md and share it with your team.

Best Practices

  • Ensure latest experiment logs exist in reports/ or experiments/.
  • Verify there's a corresponding baseline to compare against.
  • Standardize metric names across experiments for consistent comparisons.
  • Include at least one relevant plot (training curves, confusion matrix) in the report.
  • Review the generated Markdown for accuracy before sharing.

Example Use Cases

  • Transformer-v2-lr-sweep: generate a report to share with the team.
  • CNN-augmentation-warmup: produce a report after evaluation to compare to baseline.
  • bert-finetune-epoch10: create a summary report for leadership.
  • lstm-hyperparam-search: capture metrics, plots, and baseline comparison.
  • random-forest-baselineCheck: generate report highlighting baseline alignment.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers