correlation-explorer
Scannednpx machina-cli add skill dkyazzentwatwa/chatgpt-skills/correlation-explorer --openclawFiles (1)
SKILL.md
5.4 KB
Correlation Explorer
Analyze correlations between variables in CSV/Excel datasets.
Features
- Correlation Matrix: Compute all pairwise correlations
- Heatmap Visualization: Color-coded correlation display
- Significance Testing: P-values for correlations
- Multiple Methods: Pearson, Spearman, Kendall
- Strong Correlations: Find highly correlated pairs
- Target Analysis: Correlations with specific variable
Quick Start
from correlation_explorer import CorrelationExplorer
explorer = CorrelationExplorer()
# Load and analyze
explorer.load_csv("sales_data.csv")
matrix = explorer.correlation_matrix()
# Find strong correlations
strong = explorer.find_strong_correlations(threshold=0.7)
print(strong)
# Generate heatmap
explorer.plot_heatmap("correlation_heatmap.png")
CLI Usage
# Compute correlation matrix
python correlation_explorer.py --input data.csv --output correlations.csv
# Generate heatmap
python correlation_explorer.py --input data.csv --heatmap heatmap.png
# Find strong correlations
python correlation_explorer.py --input data.csv --strong --threshold 0.7
# Correlations with target variable
python correlation_explorer.py --input data.csv --target sales
# Use Spearman correlation
python correlation_explorer.py --input data.csv --method spearman
# Include p-values
python correlation_explorer.py --input data.csv --pvalues
API Reference
CorrelationExplorer Class
class CorrelationExplorer:
def __init__(self)
# Data loading
def load_csv(self, filepath: str, **kwargs) -> 'CorrelationExplorer'
def load_dataframe(self, df: pd.DataFrame) -> 'CorrelationExplorer'
# Analysis
def correlation_matrix(self, method: str = "pearson") -> pd.DataFrame
def correlation_with_pvalues(self, method: str = "pearson") -> tuple
def correlate_with_target(self, target: str, method: str = "pearson") -> pd.Series
# Discovery
def find_strong_correlations(self, threshold: float = 0.7) -> list
def find_weak_correlations(self, threshold: float = 0.3) -> list
# Visualization
def plot_heatmap(self, output: str, **kwargs) -> str
def plot_scatter(self, var1: str, var2: str, output: str) -> str
# Export
def to_csv(self, output: str) -> str
def to_json(self, output: str) -> str
Correlation Methods
| Method | Best For |
|---|---|
pearson | Linear relationships, normal data |
spearman | Non-linear, ordinal data |
kendall | Small samples, ordinal data |
# Pearson (default) - parametric
matrix = explorer.correlation_matrix(method="pearson")
# Spearman - rank-based, non-parametric
matrix = explorer.correlation_matrix(method="spearman")
# Kendall - robust to outliers
matrix = explorer.correlation_matrix(method="kendall")
Output Format
Correlation Matrix
sales marketing customers
sales 1.000 0.854 0.723
marketing 0.854 1.000 0.612
customers 0.723 0.612 1.000
Strong Correlations
[
{"var1": "sales", "var2": "marketing", "correlation": 0.854, "abs_corr": 0.854},
{"var1": "sales", "var2": "customers", "correlation": 0.723, "abs_corr": 0.723}
]
With P-Values
{
"correlations": DataFrame,
"pvalues": DataFrame,
"significant": [...], # p < 0.05
}
Example Workflows
Feature Selection
explorer = CorrelationExplorer()
explorer.load_csv("features.csv")
# Find features correlated with target
target_corr = explorer.correlate_with_target("target")
important_features = target_corr[abs(target_corr) > 0.3].index.tolist()
print(f"Important features: {important_features}")
# Find multicollinear features (to potentially drop)
strong = explorer.find_strong_correlations(threshold=0.9)
print("Highly correlated pairs (consider dropping one):")
for pair in strong:
print(f" {pair['var1']} <-> {pair['var2']}: {pair['correlation']:.3f}")
Sales Analysis
explorer = CorrelationExplorer()
explorer.load_csv("sales_data.csv")
# What drives sales?
sales_corr = explorer.correlate_with_target("revenue")
print("Factors correlated with revenue:")
for var, corr in sales_corr.sort_values(ascending=False).items():
if var != "revenue":
print(f" {var}: {corr:.3f}")
# Visualize
explorer.plot_heatmap("sales_correlations.png")
Data Exploration
explorer = CorrelationExplorer()
explorer.load_csv("dataset.csv")
# Get full picture
corr, pvals = explorer.correlation_with_pvalues()
# Find all significant correlations
significant = []
for i in range(len(corr.columns)):
for j in range(i+1, len(corr.columns)):
if pvals.iloc[i, j] < 0.05:
significant.append({
'var1': corr.columns[i],
'var2': corr.columns[j],
'r': corr.iloc[i, j],
'p': pvals.iloc[i, j]
})
Heatmap Options
explorer.plot_heatmap(
output="heatmap.png",
cmap="coolwarm", # Color scheme
annot=True, # Show values
figsize=(12, 10), # Figure size
vmin=-1, vmax=1, # Color scale
title="Correlation Matrix"
)
Dependencies
- pandas>=2.0.0
- numpy>=1.24.0
- scipy>=1.10.0
- matplotlib>=3.7.0
- seaborn>=0.12.0
Source
git clone https://github.com/dkyazzentwatwa/chatgpt-skills/blob/main/correlation-explorer/SKILL.mdView on GitHub Overview
Correlation Explorer analyzes relationships between variables in CSV and Excel datasets. It provides a correlation matrix, a heatmap visualization, and optional p-values, with support for Pearson, Spearman, and Kendall methods to suit different data types.
How This Skill Works
It loads a dataset (CSV or DataFrame), computes a pairwise correlation matrix using a chosen method (pearson, spearman, kendall), and can compute p-values for significance. It can identify strong correlations, analyze correlations with a target variable, and render visuals like heatmaps or scatter plots.
When to Use It
- During initial data exploration to uncover relationships between features.
- When performing feature selection by evaluating correlations with the target variable.
- To detect multicollinearity among predictors and decide which features to drop.
- When comparing linear vs non-linear relationships using Pearson, Spearman, or Kendall methods.
- When preparing visuals for reports or dashboards to communicate relationships.
Quick Start
- Step 1: Initialize and load data with CorrelationExplorer(); explorer.load_csv("data.csv")
- Step 2: Compute the correlation matrix: matrix = explorer.correlation_matrix()
- Step 3: Generate a heatmap: explorer.plot_heatmap("heatmap.png")
Best Practices
- Start with Pearson for normally distributed data, then validate with Spearman or Kendall for non-linear or ordinal data.
- Use the heatmap to quickly spot clusters of highly correlated features.
- Examine p-values (if enabled) to distinguish statistically significant correlations from noise.
- Leverage correlate_with_target to prioritize features most related to the target.
- Save results with to_csv/to_json and generate heatmaps for sharing with stakeholders.
Example Use Cases
- Marketing analytics: identify features strongly correlated with sales to drive campaigns.
- Product analytics: correlate user engagement metrics with retention.
- Quality control: discover which sensor readings align with failure events.
- Feature engineering: drop or combine highly correlated predictors to reduce redundancy.
- Dashboard reporting: produce heatmaps that illustrate feature relationships for stakeholders.
Frequently Asked Questions
Add this skill to your agents