Get the FREE Ultimate OpenClaw Setup Guide →

exploring-data

Scanned
npx machina-cli add skill oaustegard/claude-skills/exploring-data --openclaw
Files (1)
SKILL.md
2.5 KB

Exploring Data

Workflow

1. Check if installed (instant)

bash /mnt/skills/user/exploring-data/scripts/check_install.sh

Returns: installed or not_installed

2. Install if needed (one-time, ~19s)

if [ "$(bash check_install.sh)" = "not_installed" ]; then
    bash /mnt/skills/user/exploring-data/scripts/install_ydata.sh
fi

3. Run analysis (always generates JSON + HTML by default)

bash /mnt/skills/user/exploring-data/scripts/analyze.sh <filepath> [minimal|full] [html|json]

Defaults: minimal + html (also generates JSON)

Output:

  • eda_report.html - Interactive report for user
  • eda_report.json - Machine-readable for Claude analysis

4. If Claude needs to analyze (user asks "what do you think?" etc.)

python /mnt/skills/user/exploring-data/scripts/summarize_insights.py /mnt/user-data/outputs/eda_report.json

Reads: eda_report.json (comprehensive ydata output)
Writes: eda_insights_summary.md (condensed for Claude)
Outputs to stdout: Formatted markdown summary

Claude should read the stdout markdown summary, NOT the full JSON report.

Invocation Examples

# Standard workflow (user views HTML)
bash analyze.sh /mnt/user-data/uploads/data.csv
# Produces: eda_report.html + eda_report.json
# Link user to: computer:///mnt/user-data/outputs/eda_report.html

# User asks Claude to analyze
bash analyze.sh /mnt/user-data/uploads/data.csv
python summarize_insights.py /mnt/user-data/outputs/eda_report.json
# Claude reads the stdout markdown summary
# Claude can then provide analysis based on patterns/insights

# Full mode for comprehensive analysis
bash analyze.sh /mnt/user-data/uploads/data.csv full

# JSON-only output (skip HTML generation)
bash analyze.sh /mnt/user-data/uploads/data.csv minimal json

Modes

Minimal (default, 5-10s): Dataset overview, variable analysis, correlations, missing values, alerts

Full (10-20s): Everything in minimal + scatter matrices, sample data, character analysis, more visualizations

User Triggers for Full Mode

"comprehensive analysis", "detailed EDA", "full profiling", "deep analysis"

Otherwise use minimal.

Source

git clone https://github.com/oaustegard/claude-skills/blob/main/exploring-data/SKILL.mdView on GitHub

Overview

Performs exploratory data analysis using ydata-profiling. It supports uploading common data formats (.csv, .xlsx, .json, .parquet) or responding to requests like 'explore data', 'analyze dataset', 'EDA', or 'profile data'. Produces an interactive HTML report and a machine-readable JSON report with statistics, visualizations, correlations, and data quality alerts.

How This Skill Works

Workflow: check installation (instant), install if needed with the provided script, then run analysis with analyze.sh <filepath> [minimal|full] [html|json]. By default it runs minimal + html and outputs eda_report.html and eda_report.json. When Claude needs a summary, run summarize_insights.py on eda_report.json to produce eda_insights_summary.md for stdout.

When to Use It

  • User uploads a dataset (.csv/.xlsx/.json/.parquet) and needs a quick profile (stats, missing values, correlations).
  • A user asks for 'explore data', 'analyze dataset', 'EDA', or 'profile data' to trigger a profiling session.
  • You need both an interactive HTML report and a machine-readable JSON report for downstream automation.
  • Data quality issues are suspected; you want automated alerts on missing values, duplicates, or outliers.
  • Claude analysis is required; summarize_insights.py can condense the JSON report into a Claude-friendly markdown summary.

Quick Start

  1. Step 1: Ensure the skill is installed and install if needed: bash /mnt/skills/user/exploring-data/scripts/check_install.sh; if not installed, bash /mnt/skills/user/exploring-data/scripts/install_ydata.sh.
  2. Step 2: Run analysis with your file: bash /mnt/skills/user/exploring-data/scripts/analyze.sh <filepath> [minimal|full] [html|json].
  3. Step 3: Open eda_report.html (interactive) or read eda_report.json (machine-readable). Optional: run summarize_insights.py to produce a Claude-friendly summary.

Best Practices

  • Start with Minimal mode to get dataset overview fast.
  • Switch to Full mode when you need scatter matrices, sample data, and deeper visuals.
  • Review the generated alerts to identify data quality or integrity issues.
  • Ensure the file path is correct and accessible by the skill environment.
  • Use the JSON output for automation or for Claude-based analyses (summarize_insights).

Example Use Cases

  • Upload data.csv and run analyze.sh data.csv; review eda_report.html and eda_report.json for profiling.
  • Run with full mode on a parquet dataset to get visualizations and character analyses.
  • Ask Claude to summarize insights by running summarize_insights.py on eda_report.json.
  • Integrate into a data pipeline by consuming eda_report.json in downstream tooling.
  • Use alerts from the report to drive a data cleaning plan for missing values and outliers.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers