What file formats are supported?

CSV, XLSX, JSON, and Parquet formats are accepted for analysis.

What outputs are produced?

An interactive HTML report (eda_report.html) and a machine-readable JSON report (eda_report.json). For Claude analysis, a summarized markdown is generated via summarize_insights.py (edas_insights_summary.md) and printed to stdout.

How do I run minimal vs full mode?

Default is Minimal (quick profiling). Use 'full' after the filepath (e.g., analyze.sh file.csv full) or pass triggers like 'comprehensive analysis' to switch to full mode.

exploring-data

Scanned

npx machina-cli add skill oaustegard/claude-skills/exploring-data --openclaw

Files (1)

SKILL.md

2.5 KB

Exploring Data

Workflow

1. Check if installed (instant)

bash /mnt/skills/user/exploring-data/scripts/check_install.sh

Returns: installed or not_installed

2. Install if needed (one-time, ~19s)

if [ "$(bash check_install.sh)" = "not_installed" ]; then
    bash /mnt/skills/user/exploring-data/scripts/install_ydata.sh
fi

3. Run analysis (always generates JSON + HTML by default)

bash /mnt/skills/user/exploring-data/scripts/analyze.sh <filepath> [minimal|full] [html|json]

Defaults: minimal + html (also generates JSON)

Output:

eda_report.html - Interactive report for user
eda_report.json - Machine-readable for Claude analysis

4. If Claude needs to analyze (user asks "what do you think?" etc.)

python /mnt/skills/user/exploring-data/scripts/summarize_insights.py /mnt/user-data/outputs/eda_report.json

Reads: eda_report.json (comprehensive ydata output)
Writes: eda_insights_summary.md (condensed for Claude)
Outputs to stdout: Formatted markdown summary

Claude should read the stdout markdown summary, NOT the full JSON report.

Invocation Examples

# Standard workflow (user views HTML)
bash analyze.sh /mnt/user-data/uploads/data.csv
# Produces: eda_report.html + eda_report.json
# Link user to: computer:///mnt/user-data/outputs/eda_report.html

# User asks Claude to analyze
bash analyze.sh /mnt/user-data/uploads/data.csv
python summarize_insights.py /mnt/user-data/outputs/eda_report.json
# Claude reads the stdout markdown summary
# Claude can then provide analysis based on patterns/insights

# Full mode for comprehensive analysis
bash analyze.sh /mnt/user-data/uploads/data.csv full

# JSON-only output (skip HTML generation)
bash analyze.sh /mnt/user-data/uploads/data.csv minimal json

Modes

Minimal (default, 5-10s): Dataset overview, variable analysis, correlations, missing values, alerts

Full (10-20s): Everything in minimal + scatter matrices, sample data, character analysis, more visualizations

User Triggers for Full Mode

"comprehensive analysis", "detailed EDA", "full profiling", "deep analysis"

Otherwise use minimal.

Source

git clone https://github.com/oaustegard/claude-skills/blob/main/exploring-data/SKILL.mdView on GitHub

Overview

Performs exploratory data analysis using ydata-profiling. It supports uploading common data formats (.csv, .xlsx, .json, .parquet) or responding to requests like 'explore data', 'analyze dataset', 'EDA', or 'profile data'. Produces an interactive HTML report and a machine-readable JSON report with statistics, visualizations, correlations, and data quality alerts.

How This Skill Works

Workflow: check installation (instant), install if needed with the provided script, then run analysis with analyze.sh <filepath> [minimal|full] [html|json]. By default it runs minimal + html and outputs eda_report.html and eda_report.json. When Claude needs a summary, run summarize_insights.py on eda_report.json to produce eda_insights_summary.md for stdout.

When to Use It

User uploads a dataset (.csv/.xlsx/.json/.parquet) and needs a quick profile (stats, missing values, correlations).
A user asks for 'explore data', 'analyze dataset', 'EDA', or 'profile data' to trigger a profiling session.
You need both an interactive HTML report and a machine-readable JSON report for downstream automation.
Data quality issues are suspected; you want automated alerts on missing values, duplicates, or outliers.
Claude analysis is required; summarize_insights.py can condense the JSON report into a Claude-friendly markdown summary.

Quick Start

Step 1: Ensure the skill is installed and install if needed: bash /mnt/skills/user/exploring-data/scripts/check_install.sh; if not installed, bash /mnt/skills/user/exploring-data/scripts/install_ydata.sh.
Step 2: Run analysis with your file: bash /mnt/skills/user/exploring-data/scripts/analyze.sh <filepath> [minimal|full] [html|json].
Step 3: Open eda_report.html (interactive) or read eda_report.json (machine-readable). Optional: run summarize_insights.py to produce a Claude-friendly summary.

Best Practices

Start with Minimal mode to get dataset overview fast.
Switch to Full mode when you need scatter matrices, sample data, and deeper visuals.
Review the generated alerts to identify data quality or integrity issues.
Ensure the file path is correct and accessible by the skill environment.
Use the JSON output for automation or for Claude-based analyses (summarize_insights).

Example Use Cases

Upload data.csv and run analyze.sh data.csv; review eda_report.html and eda_report.json for profiling.
Run with full mode on a parquet dataset to get visualizations and character analyses.
Ask Claude to summarize insights by running summarize_insights.py on eda_report.json.
Integrate into a data pipeline by consuming eda_report.json in downstream tooling.
Use alerts from the report to drive a data cleaning plan for missing values and outliers.

Frequently Asked Questions

Add this skill to your agents