What methods does the skill support?

Pearson, Spearman, and Kendall are supported; select via the method parameter.

How can I get p-values for correlations?

Call correlation_with_pvalues or pass --pvalues in the CLI to include p-values.

How do I export results or visuals?

Export results with to_csv or to_json, and save plots with plot_heatmap.

correlation-explorer

Scanned

npx machina-cli add skill dkyazzentwatwa/chatgpt-skills/correlation-explorer --openclaw

Files (1)

SKILL.md

5.4 KB

Correlation Explorer

Analyze correlations between variables in CSV/Excel datasets.

Features

Correlation Matrix: Compute all pairwise correlations
Heatmap Visualization: Color-coded correlation display
Significance Testing: P-values for correlations
Multiple Methods: Pearson, Spearman, Kendall
Strong Correlations: Find highly correlated pairs
Target Analysis: Correlations with specific variable

Quick Start

from correlation_explorer import CorrelationExplorer

explorer = CorrelationExplorer()

# Load and analyze
explorer.load_csv("sales_data.csv")
matrix = explorer.correlation_matrix()

# Find strong correlations
strong = explorer.find_strong_correlations(threshold=0.7)
print(strong)

# Generate heatmap
explorer.plot_heatmap("correlation_heatmap.png")

CLI Usage

# Compute correlation matrix
python correlation_explorer.py --input data.csv --output correlations.csv

# Generate heatmap
python correlation_explorer.py --input data.csv --heatmap heatmap.png

# Find strong correlations
python correlation_explorer.py --input data.csv --strong --threshold 0.7

# Correlations with target variable
python correlation_explorer.py --input data.csv --target sales

# Use Spearman correlation
python correlation_explorer.py --input data.csv --method spearman

# Include p-values
python correlation_explorer.py --input data.csv --pvalues

API Reference

CorrelationExplorer Class

class CorrelationExplorer:
    def __init__(self)

    # Data loading
    def load_csv(self, filepath: str, **kwargs) -> 'CorrelationExplorer'
    def load_dataframe(self, df: pd.DataFrame) -> 'CorrelationExplorer'

    # Analysis
    def correlation_matrix(self, method: str = "pearson") -> pd.DataFrame
    def correlation_with_pvalues(self, method: str = "pearson") -> tuple
    def correlate_with_target(self, target: str, method: str = "pearson") -> pd.Series

    # Discovery
    def find_strong_correlations(self, threshold: float = 0.7) -> list
    def find_weak_correlations(self, threshold: float = 0.3) -> list

    # Visualization
    def plot_heatmap(self, output: str, **kwargs) -> str
    def plot_scatter(self, var1: str, var2: str, output: str) -> str

    # Export
    def to_csv(self, output: str) -> str
    def to_json(self, output: str) -> str

Correlation Methods

Method	Best For
`pearson`	Linear relationships, normal data
`spearman`	Non-linear, ordinal data
`kendall`	Small samples, ordinal data

# Pearson (default) - parametric
matrix = explorer.correlation_matrix(method="pearson")

# Spearman - rank-based, non-parametric
matrix = explorer.correlation_matrix(method="spearman")

# Kendall - robust to outliers
matrix = explorer.correlation_matrix(method="kendall")

Output Format

Correlation Matrix

           sales  marketing  customers
sales      1.000      0.854      0.723
marketing  0.854      1.000      0.612
customers  0.723      0.612      1.000

Strong Correlations

[
    {"var1": "sales", "var2": "marketing", "correlation": 0.854, "abs_corr": 0.854},
    {"var1": "sales", "var2": "customers", "correlation": 0.723, "abs_corr": 0.723}
]

With P-Values

{
    "correlations": DataFrame,
    "pvalues": DataFrame,
    "significant": [...],  # p < 0.05
}

Example Workflows

Feature Selection

explorer = CorrelationExplorer()
explorer.load_csv("features.csv")

# Find features correlated with target
target_corr = explorer.correlate_with_target("target")
important_features = target_corr[abs(target_corr) > 0.3].index.tolist()
print(f"Important features: {important_features}")

# Find multicollinear features (to potentially drop)
strong = explorer.find_strong_correlations(threshold=0.9)
print("Highly correlated pairs (consider dropping one):")
for pair in strong:
    print(f"  {pair['var1']} <-> {pair['var2']}: {pair['correlation']:.3f}")

Sales Analysis

explorer = CorrelationExplorer()
explorer.load_csv("sales_data.csv")

# What drives sales?
sales_corr = explorer.correlate_with_target("revenue")
print("Factors correlated with revenue:")
for var, corr in sales_corr.sort_values(ascending=False).items():
    if var != "revenue":
        print(f"  {var}: {corr:.3f}")

# Visualize
explorer.plot_heatmap("sales_correlations.png")

Data Exploration

explorer = CorrelationExplorer()
explorer.load_csv("dataset.csv")

# Get full picture
corr, pvals = explorer.correlation_with_pvalues()

# Find all significant correlations
significant = []
for i in range(len(corr.columns)):
    for j in range(i+1, len(corr.columns)):
        if pvals.iloc[i, j] < 0.05:
            significant.append({
                'var1': corr.columns[i],
                'var2': corr.columns[j],
                'r': corr.iloc[i, j],
                'p': pvals.iloc[i, j]
            })

Heatmap Options

explorer.plot_heatmap(
    output="heatmap.png",
    cmap="coolwarm",      # Color scheme
    annot=True,           # Show values
    figsize=(12, 10),     # Figure size
    vmin=-1, vmax=1,      # Color scale
    title="Correlation Matrix"
)

Dependencies

pandas>=2.0.0
numpy>=1.24.0
scipy>=1.10.0
matplotlib>=3.7.0
seaborn>=0.12.0

Source

git clone https://github.com/dkyazzentwatwa/chatgpt-skills/blob/main/correlation-explorer/SKILL.mdView on GitHub

Overview

Correlation Explorer analyzes relationships between variables in CSV and Excel datasets. It provides a correlation matrix, a heatmap visualization, and optional p-values, with support for Pearson, Spearman, and Kendall methods to suit different data types.

How This Skill Works

It loads a dataset (CSV or DataFrame), computes a pairwise correlation matrix using a chosen method (pearson, spearman, kendall), and can compute p-values for significance. It can identify strong correlations, analyze correlations with a target variable, and render visuals like heatmaps or scatter plots.

When to Use It

During initial data exploration to uncover relationships between features.
When performing feature selection by evaluating correlations with the target variable.
To detect multicollinearity among predictors and decide which features to drop.
When comparing linear vs non-linear relationships using Pearson, Spearman, or Kendall methods.
When preparing visuals for reports or dashboards to communicate relationships.

Quick Start

Step 1: Initialize and load data with CorrelationExplorer(); explorer.load_csv("data.csv")
Step 2: Compute the correlation matrix: matrix = explorer.correlation_matrix()
Step 3: Generate a heatmap: explorer.plot_heatmap("heatmap.png")

Best Practices

Start with Pearson for normally distributed data, then validate with Spearman or Kendall for non-linear or ordinal data.
Use the heatmap to quickly spot clusters of highly correlated features.
Examine p-values (if enabled) to distinguish statistically significant correlations from noise.
Leverage correlate_with_target to prioritize features most related to the target.
Save results with to_csv/to_json and generate heatmaps for sharing with stakeholders.

Example Use Cases

Marketing analytics: identify features strongly correlated with sales to drive campaigns.
Product analytics: correlate user engagement metrics with retention.
Quality control: discover which sensor readings align with failure events.
Feature engineering: drop or combine highly correlated predictors to reduce redundancy.
Dashboard reporting: produce heatmaps that illustrate feature relationships for stakeholders.

Frequently Asked Questions

Add this skill to your agents