ab-test-stats
Scannednpx machina-cli add skill guia-matthieu/clawfu-skills/ab-test-stats --openclawFiles (1)
SKILL.md
4.1 KB
A/B Test Statistics Calculator
Calculate statistical significance for A/B tests - know when your results are real, not random chance.
When to Use This Skill
- Test analysis - Determine if results are statistically significant
- Sample planning - Calculate required sample size before testing
- Duration estimation - Know how long to run experiments
- Power analysis - Ensure tests can detect meaningful differences
What Claude Does vs What You Decide
| Claude Does | You Decide |
|---|---|
| Structures analysis frameworks | Metric definitions |
| Identifies patterns in data | Business interpretation |
| Creates visualization templates | Dashboard design |
| Suggests optimization areas | Action priorities |
| Calculates statistical measures | Decision thresholds |
Dependencies
pip install scipy numpy click
Commands
Check Significance
python scripts/main.py significance --control 1000,50 --variant 1000,65
python scripts/main.py significance --control 5000,250 --variant 5000,300 --confidence 0.99
Calculate Sample Size
python scripts/main.py sample-size --baseline 0.05 --mde 0.02
python scripts/main.py sample-size --baseline 0.10 --mde 0.01 --power 0.90
Estimate Duration
python scripts/main.py duration --traffic 1000 --baseline 0.05 --mde 0.02
Examples
Example 1: Analyze Test Results
# Control: 1000 visitors, 50 conversions (5%)
# Variant: 1000 visitors, 65 conversions (6.5%)
python scripts/main.py significance --control 1000,50 --variant 1000,65
# Output:
# A/B Test Results
# ─────────────────────────
# Control: 5.00% (50/1000)
# Variant: 6.50% (65/1000)
# Lift: +30.0%
#
# Statistical Analysis
# ─────────────────────────
# p-value: 0.089
# Confidence: 91.1%
# Result: NOT SIGNIFICANT (need 95%)
#
# Recommendation: Continue test for more data
Example 2: Plan Sample Size
# Baseline 5% conversion, want to detect 20% relative lift (1% absolute)
python scripts/main.py sample-size --baseline 0.05 --mde 0.01
# Output:
# Sample Size Calculator
# ──────────────────────────────
# Baseline conversion: 5.0%
# Minimum detectable effect: 1.0% (20% relative)
# Target conversion: 6.0%
#
# Required per variant: 3,842 visitors
# Total required: 7,684 visitors
#
# At 1000 daily visitors: ~8 days
Key Concepts
| Term | Definition |
|---|---|
| p-value | Probability result is due to chance |
| Confidence | 1 - p-value (usually want 95%+) |
| Power | Probability of detecting real effect (usually 80%) |
| MDE | Minimum Detectable Effect - smallest lift worth detecting |
| Lift | Relative improvement (variant - control) / control |
When Results Are Significant
| p-value | Confidence | Verdict |
|---|---|---|
| < 0.01 | > 99% | Highly Significant ✓ |
| < 0.05 | > 95% | Significant ✓ |
| < 0.10 | > 90% | Marginally Significant |
| ≥ 0.10 | < 90% | Not Significant ✗ |
Skill Boundaries
What This Skill Does Well
- Structuring data analysis
- Identifying patterns and trends
- Creating visualization frameworks
- Calculating statistical measures
What This Skill Cannot Do
- Access your actual data
- Replace statistical expertise
- Make business decisions
- Guarantee prediction accuracy
Related Skills
- cohort-analysis - Analyze user cohorts
- funnel-analyzer - Analyze conversion funnels
Skill Metadata
- Mode: centaur
category: analytics
subcategory: statistics
dependencies: [scipy, numpy]
difficulty: intermediate
time_saved: 3+ hours/week
Source
git clone https://github.com/guia-matthieu/clawfu-skills/blob/main/skills/analytics/ab-test-stats/SKILL.mdView on GitHub Overview
Determines if A/B test results are statistically significant and guides planning. It helps you calculate required sample sizes, estimate experiment duration, and perform power analyses to detect meaningful differences.
How This Skill Works
The tool computes p-values, confidence levels, and lift from control/variant data to assess significance. It provides commands for significance, sample-size, and duration, leveraging scipy and numpy for robust statistical calculations.
When to Use It
- Determine if test results are statistically significant
- Calculate required sample size before running a test
- Estimate how long an experiment should run
- Perform power analysis to detect meaningful differences
- Analyze conversion experiments to guide decisions
Quick Start
- Step 1: Install dependencies (pip install scipy numpy click)
- Step 2: Run significance with control/variant data, e.g., python scripts/main.py significance --control 1000,50 --variant 1000,65
- Step 3: Interpret the output (p-value, confidence, result) and decide to continue or stop
Best Practices
- Define baseline and minimum detectable effect (MDE) before testing
- Use adequate sample sizes to avoid false negatives or positives
- Interpret p-values and confidence levels in the context of your business goals
- Run power analysis to ensure the test can detect the desired lift
- Plan duration with realistic traffic estimates to prevent premature conclusions
Example Use Cases
- Example 1: Analyze Test Results – compare control vs. variant conversions and assess significance
- Example 2: Plan Sample Size – baseline 5% with 1% absolute MDE to compute required visitors
- Example 3: Estimate Duration – with 1000 daily visitors, baseline 5% and MDE 2%
- Example 4: Output Interpretation – determine if results are not significant and require more data
- Example 5: Power Check – ensure the test design can detect the desired lift before starting
Frequently Asked Questions
Add this skill to your agents