PRISM is a protocol for rigorous investigation of scientific mechanisms that uses Bayesian evidence synthesis to produce posterior probabilities for competing hypotheses.

What priors are used?

PRISM uses empirical base rates with Beta priors: phase2_clinical 15.3% [9.0%, 23.0%], phase3_clinical 35.1% [26.2%, 44.7%], replication 40.1% [30.8%, 49.8%], and general 50.0% (uninformative).

How does it handle multiple hypotheses?

It employs hierarchical correlation correction to avoid exponential overconfidence, updating beliefs using log-odds with evidence from diverse sources and controlled dependencies.

Claude

npx machina-cli add skill Dr-AneeshJoseph/Prism/Claude --openclaw

Files (1)

SKILL.md

10.5 KB

PRISM v2.2 - Protocol for Rigorous Investigation of Scientific Mechanisms

Author: Dr. Aneesh Joseph
Implementation: Claude (Anthropic)
Version: 2.2 | December 2025

When to Use This Skill

Use PRISM when the user asks to:

Evaluate scientific hypotheses quantitatively
Compare multiple hypotheses using Bayesian methods
Perform rigorous evidence synthesis with meta-analysis
Calculate posterior probabilities from heterogeneous evidence
Research and prioritize scientific questions
Assess treatment options, drug efficacy, or technology predictions

Trigger phrases:

"Find the best hypothesis for..."
"Compare these hypotheses..."
"What's the probability that..."
"Analyze the evidence for..."
"Which treatment is most likely to work..."

Core Philosophy

Traditional hypothesis testing asks: "Is this result statistically significant?"

PRISM asks: "Given all available evidence, what is the probability this hypothesis is true?"

This shift from binary significance to continuous credence enables:

Comparison across different evidence types
Principled evidence accumulation
Explicit uncertainty quantification
Optimizer's curse correction for multiple comparisons

Mathematical Framework

Bayesian Foundation

PRISM uses Bayes' theorem with log-odds for numerical stability:

log-odds(H|E) = log-odds(H) + log(LR)

Reference Class Priors

Instead of arbitrary priors, PRISM uses empirical base rates with Beta distributions:

phase2_clinical: 15.3% [9.0%, 23.0%] (based on FDA 2000-2020)
phase3_clinical: 35.1% [26.2%, 44.7%]
replication: 40.1% [30.8%, 49.8%] (based on OSF 2015)
general: 50.0% (uninformative)

Hierarchical Correlation Correction

Critical: Naive multiplication of likelihood ratios causes exponential overconfidence.

PRISM uses hierarchical correlation:

Within-cluster ρ = 0.6 (same lab/method)
Between-cluster ρ = 0.2 (independent labs)

Effective N = Σ(cluster_size / DEFF_within) / √DEFF_between

Key Statistical Methods

REML Meta-Analysis with Hartung-Knapp adjustment
P-Curve Analysis for publication bias detection
Kalman Filtering for temporal evidence integration
Optimizer's Curse Correction when comparing multiple hypotheses
Sobol Sensitivity Analysis to identify critical evidence

Implementation Guide

File Structure

/home/claude/
├── prism_v2_2.py          # Core PRISM engine (Dr. Joseph's implementation)
├── prism_session.py       # Session management with checkpointing
└── example_*.py           # Example analyses

/mnt/user-data/outputs/prism_{project}/
├── state.json             # Project state + resume instructions
├── RESUME.md              # Human-readable resume point
├── hypotheses/            # Hypothesis data files
│   ├── h1_*.json
│   └── h2_*.json
└── results/
    ├── comparison.json    # Comparative analysis
    ├── FINAL_REPORT.md    # Complete report
    └── *_results.json     # Individual hypothesis results

Basic Usage Pattern

from prism_session import PRISMSession
from prism_v2_2 import Evidence, Domain

# 1. Create session
session = PRISMSession("project_name")

# 2. Add hypotheses
h1 = session.add_hypothesis(
    hypothesis_id="h1_treatment_a",
    title="Treatment A reduces symptoms by >20%",
    domain=Domain.MEDICAL,
    reference_class="phase2_clinical"
)

# 3. Add evidence (from web_search, papers, etc.)
h1.add_evidence(Evidence(
    id="study1",
    content="RCT shows 25% reduction",
    source="NEJM 2024",
    domain=Domain.MEDICAL,
    study_design="rct",
    sample_size=200,
    supports=True,
    p_value=0.01,
    effect_size=-0.45,
    effect_var=0.0144,  # SE^2
    authors=["Smith"]
))

# Update the saved hypothesis
session._save_hypothesis(h1, 
    session.hypotheses_dir / "h1_treatment_a.json",
    "h1_treatment_a")

# 4. Analyze all hypotheses
session.analyze_all(set_n_compared=True)  # Applies optimizer's curse

# 5. Generate report
session.generate_report()

Workflow for Real Research Questions

When a user asks: "Find the best hypothesis for treating osteoarthritis"

Search for hypotheses:

# Use web_search to find treatment options
web_search("osteoarthritis treatment options 2024 clinical trials")
web_search("osteoarthritis systematic reviews meta-analysis")

Create session and define hypotheses:

session = PRISMSession("osteoarthritis_treatment_2025")

# Add each candidate hypothesis
h1 = session.add_hypothesis(
    "h1_weight_loss",
    "Weight loss reduces knee OA pain",
    Domain.MEDICAL,
    "replication"
)
# ... add more hypotheses

Extract evidence from search results:

# For each relevant study found:
h1.add_evidence(Evidence(
    id="messier_2013",
    content="18-month RCT: 10% weight loss reduced pain 50%",
    source="JAMA 2013;310(12):1263",
    study_design="rct",
    sample_size=454,
    supports=True,
    p_value=0.0001,
    effect_size=-0.48,
    effect_var=0.0144,
    authors=["Messier", "Mihalko"]
))

Run complete analysis:

session.analyze_all(set_n_compared=True)

Present results:

session.generate_report()
present_files([
    session.results_dir / "FINAL_REPORT.md",
    session.results_dir / "comparison.json"
])

Checkpointing and Resumability

Design Principle

"Build for one Claude, checkpoint for many"

The system is designed to complete in ONE session but checkpoints after each hypothesis for safety.

Automatic Checkpointing

After analyzing each hypothesis, PRISM automatically:

Saves results to JSON
Updates state.json
Writes RESUME.md with instructions
Estimates tokens used

Resume Protocol

If analysis is interrupted:

from prism_session import PRISMSession

# Load existing project
session = PRISMSession("project_name")  # Automatically loads state

# Continue analysis
session.resume()  # Or session.analyze_all()

The RESUME.md file tells you exactly where you left off:

Which hypotheses are completed
Which are pending
Python code to continue

Critical Implementation Rules

ALWAYS use web_search when user asks about real-world hypotheses
- Search for recent studies, clinical trials, systematic reviews
- Extract evidence from authoritative sources
- Don't fabricate studies or data
ALWAYS checkpoint after each hypothesis
- Use session._checkpoint_hypothesis() automatically called
- Saves to /mnt/user-data/outputs/prism_{project}/
ALWAYS apply optimizer's curse correction when comparing >1 hypothesis
- Set n_compared on each hypothesis
- Or use session.analyze_all(set_n_compared=True)
ALWAYS use hierarchical correlation
- Evidence is automatically clustered by (author, study_design, source)
- within_rho=0.6, between_rho=0.2
ALWAYS decompose uncertainty
- Statistical (from CI width)
- Prior (from reference class uncertainty)
- Model (from correlation assumptions)
Token awareness:
- Typical analysis: 5-10K tokens per hypothesis
- 8 hypotheses ≈ 40-80K tokens
- Well within 190K budget for most analyses

Study Design Strength

Evidence strength by study design (likelihood ratios):

Study Type	LR+	Interpretation
Meta-analysis	5.4	Very strong evidence
Systematic review	5.4	Very strong evidence
RCT	2.3	Strong evidence
Cohort	1.9	Moderate evidence
Case-control	1.5	Weak evidence
Observational	1.2	Minimal evidence
Expert opinion	1.1	Very weak evidence

Interpretation Guidelines

Posterior Probability Scale

Posterior	Interpretation	Action
< 10%	Unlikely	Deprioritize
10-30%	Possible	Gather more evidence
30-70%	Uncertain	Key decision point
70-90%	Probable	Consider acting
> 90%	Highly likely	Act with monitoring

Warning Signs

⚠️ High model uncertainty (>25%): Evidence may be highly correlated
⚠️ P-hacking detected: Effect sizes may be inflated
⚠️ High I² (>75%): Studies measuring different things
⚠️ Extreme posterior (>95%): Likely overconfident
⚠️ Wide credible intervals: Need more evidence

Example Output Format

When presenting results to users:

🏆 Best Hypothesis: h1_weight_loss
   Weight loss reduces knee OA pain and improves function
   Posterior (corrected): 88.8%

📋 Full Ranking:
   🥇 h1_weight_loss: 88.8%
   🥈 h2_exercise: 88.5%
   🥉 h3_combination: 67.2%
   ...

📄 Full report available: [link to FINAL_REPORT.md]

Limitations

What PRISM Cannot Do

Replace domain expertise - requires judgment for prior selection
Detect fraud - assumes evidence is honestly reported
Handle unknown unknowns - only evaluates provided evidence
Guarantee calibration - model assumptions may be wrong

Appropriate Use

✅ Exploratory analysis and hypothesis generation
✅ Research prioritization and resource allocation
✅ Structured evidence synthesis
✅ Treatment comparison and decision support
✅ Teaching Bayesian reasoning

Inappropriate Use

❌ Regulatory approval decisions (use established methods)
❌ Legal proceedings (requires validated forensic tools)
❌ Fully automated decision-making
❌ Single-study evaluation

Quick Reference

Available Reference Classes

'phase2_clinical'  # ~15% prior (early drug development)
'phase3_clinical'  # ~35% prior (late drug development)
'drug_approval'    # ~10% prior (FDA approval standard)
'replication'      # ~40% prior (scientific replication)
'general'          # ~50% prior (uninformative)

Key Functions

# Session management
session = PRISMSession("project_name")
h = session.add_hypothesis(id, title, domain, ref_class)
session.analyze_all()
session.generate_report()

# Evidence creation
e = Evidence(id, content, source, domain, study_design,
             sample_size, supports, p_value, effect_size, 
             effect_var, authors)

References

Gelman et al. (2013) Bayesian Data Analysis
Patterson & Thompson (1971) Biometrika - REML
Hartung & Knapp (2001) Statistics in Medicine
Simonsohn et al. (2014) J Exp Psych: General - P-curve
Kahneman & Tversky (1979) - Reference class forecasting
Dr. Aneesh Joseph (2025) PRISM v2.2 Scientific Guide

PRISM v2.2 - Rigorous hypothesis evaluation for evidence-based science

Source

git clone https://github.com/Dr-AneeshJoseph/Prism/blob/main/Claude/SKILL.mdView on GitHub

Overview

PRISM v2.2 is a protocol for rigorous investigation of scientific mechanisms. It reframes hypothesis testing from binary significance to probabilistic credence, enabling principled evidence combination across diverse data sources and explicit uncertainty quantification.

How This Skill Works

PRISM applies Bayes' theorem in log-odds form to update beliefs as evidence arrives. It uses empirical base-rate priors and models within-cluster and between-cluster correlations to avoid overconfidence, implementing methods like REML meta-analysis, P-Curve bias detection, Kalman filtering, and Sobol sensitivity analysis to synthesize and interrogate evidence.

When to Use It

Evaluate scientific hypotheses quantitatively
Compare multiple hypotheses using Bayesian methods
Perform rigorous evidence synthesis with meta-analysis
Calculate posterior probabilities from heterogeneous evidence
Research and prioritize scientific questions
Assess treatment options, drug efficacy, or technology predictions

Quick Start

Step 1: Create a PRISMSession for your project
Step 2: Add hypotheses with domains and reference_class (priors)
Step 3: Add evidence (studies, data) and run analyze_all to update posteriors

Best Practices

Define hypotheses clearly and establish empirical priors from base-rate data
Model within-cluster and between-cluster correlations to avoid overconfidence
Use REML meta-analysis with Hartung-Knapp adjustment for heterogeneity
Incorporate P-Curve analysis to detect publication bias
Report posterior probabilities with uncertainty and conduct sensitivity checks (e.g., Sobol, Kalman)

Example Use Cases

Evaluating a drug's efficacy across phase 2 and phase 3 trials with hierarchical evidence
Comparing competing treatments by updating posteriors as new studies arrive
Prioritizing research questions based on evolving posterior credence
Synthesizing replication studies to confirm effect sizes across labs
Temporal evidence integration for technology performance predictions

Frequently Asked Questions

Add this skill to your agents