judgment-postmortem-calibration
npx machina-cli add skill evalops/open-associate-skills/judgment-postmortem-calibration --openclawJudgment postmortem calibration
When to use
Use this skill when you want to:
- Improve selection judgment (faster learning, fewer repeated mistakes)
- Capture why you said yes/no and how evidence changed your view
- Measure your prediction accuracy and learning rate over time
- Build an internal "decision log" that compounds
- Review investments or passes after outcomes are known
Trigger points:
- After every IC decision (invest or pass)
- After every competitive loss
- Quarterly: review passes that raised from others + calculate calibration metrics
- Annually: review portfolio outcomes vs initial thesis + measure learning rate
Inputs you should request (only if missing)
- Deal name + date of first meeting
- Your initial take (reconstruct honestly if not documented)
- Outcome to date (funded by others? traction? pivot? shut down?)
- Original memo or notes (if available)
- Your probability estimates at decision time (if recorded)
Outputs you must produce
- Decision log entry (structured, one page)
- Calibration scorecard (predictions vs reality with scores)
- Brier score calculation (for probabilistic predictions)
- Updated heuristics (2-5 actionable bullets)
- Pattern library update (what archetype was this?)
- Learning rate metrics (are you getting better?)
- Follow-up list (who to ping, what to track)
Templates:
- assets/decision-log.md
- assets/calibration-tracker.csv (for longitudinal tracking)
Core principle: Measure what you believe, then score it
The value of postmortems comes from:
- Honest recording of what you believed at decision time
- Quantified predictions (probabilities, not just "I thought X")
- Systematic scoring against outcomes
- Tracking improvement over time
Procedure
1) Capture the timeline with predictions
| Date | Event | Your belief | Confidence (%) | Outcome |
|---|---|---|---|---|
| First meeting | "This will be a $1B+ outcome" | 20% | ||
| First meeting | "Product-market fit within 12 months" | 60% | ||
| Diligence | "They'll close 3 enterprise deals in 6 months" | 40% | ||
| Decision | "Worth investing" | 70% | ||
| +12 months | Actual outcome |
2) Record the initial thesis with probabilities
At first meeting, I believed:
- What would make the company win:
- P(success | investment) estimate: ___%
- P(this raises next round) estimate: ___%
- P(achieves stated 12-month milestones) estimate: ___%
- Top risk and P(risk materializes): ___%
- My recommendation:
At decision point, I believed:
- P(success | investment): ___%
- P(raises next round): ___%
- Top risks with probabilities:
- Risk: ___ | P(materializes): ___%
- Risk: ___ | P(materializes): ___%
- Final recommendation:
3) Document the decision
- Decision: Invest / Pass / Lost competitive
- Stated rationale (at the time):
- Unstated factors (be honest):
- Confidence in decision: ___%
4) Score predictions against reality
Prediction scorecard:
| Prediction | Your P(true) | Actual (1/0) | Brier contribution |
|---|---|---|---|
| "This will raise Series A" | 70% | 1 (yes) | (0.7-1)² = 0.09 |
| "Product-market fit in 12mo" | 60% | 0 (no) | (0.6-0)² = 0.36 |
| "3 enterprise deals in 6mo" | 40% | 0 (no) | (0.4-0)² = 0.16 |
| "Top risk materializes" | 30% | 1 (yes) | (0.3-1)² = 0.49 |
Brier score for this deal: (sum of contributions) / n = ___
- Perfect = 0.0, Random = 0.25, Always wrong = 1.0
- Good forecaster: < 0.20
- Reasonable forecaster: 0.20 - 0.25
5) Calculate calibration (are your probabilities accurate?)
Group your historical predictions by confidence level:
| Confidence bucket | Predictions | Outcomes (% true) | Calibration gap |
|---|---|---|---|
| 10-20% | 15 | 18% true | +3% (slightly under-confident) |
| 30-40% | 22 | 28% true | -7% (slightly over-confident) |
| 50-60% | 18 | 52% true | -3% (well calibrated) |
| 70-80% | 12 | 58% true | -17% (over-confident) |
| 90%+ | 5 | 80% true | -12% (over-confident) |
Calibration insight: "I tend to be over-confident in the 70-80% range. When I say 75%, things happen ~60% of the time."
6) Identify what you underweighted or overweighted
For this deal:
- Underweighted: ___
- Overweighted: ___
- Surprise factor: ___
Pattern across deals (update quarterly):
| Factor | Times underweighted | Times overweighted |
|---|---|---|
| Team learning rate | ||
| Distribution advantages | ||
| Timing/market readiness | ||
| Technical moat | ||
| Competition | ||
| Founder-market fit |
7) Extract heuristics (portable rules)
Good heuristics are:
- Specific enough to act on
- Falsifiable
- Tied to observed evidence
- Attached to a base rate
Heuristic format: "When [specific condition], [outcome] happens [X%] of the time in my experience."
Examples:
- "When a seed-stage founder can't name a specific buyer trigger event, they fail to hit enterprise sales targets 80% of the time."
- "When we invest in a second-time founder with prior distribution success, they raise Series A 90% of the time."
Write 2-5 heuristics from this postmortem: 1. 2. 3.
8) Update your pattern library with base rates
Archetype performance tracking:
| Archetype | Deals | Success rate | Avg Brier | Notes |
|---|---|---|---|---|
| First-time founder, crowded market | 8 | 25% | 0.28 | Over-confident on differentiation |
| Second-time founder, distribution edge | 5 | 80% | 0.15 | Under-confident on execution |
| Technical founder, no GTM | 6 | 33% | 0.32 | Over-weight technical moat |
9) Measure learning rate (quarterly)
Rolling Brier score by quarter:
| Quarter | Deals scored | Avg Brier | Calibration gap | Trend |
|---|---|---|---|---|
| Q1 2025 | 12 | 0.28 | 15% over-confident | Baseline |
| Q2 2025 | 15 | 0.24 | 10% over-confident | Improving |
| Q3 2025 | 14 | 0.21 | 8% over-confident | Improving |
| Q4 2025 | 16 | 0.19 | 5% over-confident | Good |
Learning rate = (Brier_t - Brier_t-1) / Brier_t-1
Target: 5-10% improvement per quarter until Brier < 0.20
10) Create follow-up list
| What to track | Signal | Recheck date | Prediction to score |
|---|---|---|---|
| P(___) = ___% | |||
| P(___) = ___% |
| Who to keep warm | Why | Next touch |
|---|---|---|
Quarterly calibration review
Every quarter:
- Score all predictions that reached outcome date
- Calculate Brier scores by deal and overall
- Update calibration table (predictions vs outcomes by confidence bucket)
- Identify systematic biases (over/under-confidence patterns)
- Review passes that raised: were pass reasons validated?
- Update heuristics with new base rates
- Calculate learning rate vs previous quarter
Quarterly output:
- Brier score trend chart
- Calibration curve (predicted % vs actual %)
- Top 3 biases to correct
- Updated heuristic base rates
Annual review: Portfolio outcomes vs initial thesis
For each portfolio company:
- What did we believe at investment?
- What's the current reality?
- Where were we right/wrong?
- Score the original predictions
- Update archetype base rates
Annual output:
- Portfolio Brier score
- Best/worst calibrated predictions
- Archetype performance update
- Learning rate over 4 quarters
- Heuristics validated or invalidated
Salesforce logging (optional)
If Salesforce is your system of record:
- Add predictions as structured fields on Opportunity (or in Notes)
- Record confidence levels at each stage
- Link postmortem Note titled "Postmortem (YYYY-MM-DD)"
- Update outcome fields when known
- Tag with heuristics extracted
Edge cases
- If you have no outcome yet: run a "process postmortem" focused on what you learned and what evidence was missing. Record predictions for future scoring.
- If the outcome is ambiguous: define binary success criteria now, score later.
- If you can't remember your initial thesis: reconstruct as honestly as possible, and start recording predictions with probabilities now.
- If you have few deals: even 10-15 scored predictions start to show calibration patterns.
Source
git clone https://github.com/evalops/open-associate-skills/blob/main/judgment-postmortem-calibration/SKILL.mdView on GitHub Overview
This skill helps you improve venture capital judgment by documenting initial beliefs, outcomes, and evidence, and by quantifying calibration over time. It uses structured postmortems with prediction probabilities, Brier scores, and learning-rate tracking to build a decision log you can review after outcomes.
How This Skill Works
You capture a timeline of predictions and decisions with confidence levels, record outcomes, and then compute a Brier score-based calibration scorecard. You also update heuristics and build a pattern library, tracking learning rate metrics to show improvement.
When to Use It
- After every IC decision (invest or pass)
- After every competitive loss
- Quarterly: review passes that others raised + calculate calibration metrics
- Annually: review portfolio outcomes vs initial thesis + measure learning rate
- During major diligence sprints to capture evidence shifts
Quick Start
- Step 1: Gather inputs (deal name, date, initial take, confidence, and outcome)
- Step 2: Fill the decision log entry and compute the initial probabilities
- Step 3: Calculate Brier scores, update calibration scorecard, and adjust heuristics
Best Practices
- Log the initial take with explicit probability estimates at each decision point
- Capture the timeline of events and outcomes to anchor predictions
- Compute and review the Brier score for probabilistic predictions against outcomes
- Update 2–5 actionable heuristics based on calibration results
- Maintain a pattern library to categorize archetypes and learning
Example Use Cases
- Seed-stage investment decision with initial 25% probability of Series A and outcome funded; Brier score tracked and learning rate assessed
- Competitive loss scenario where initial beliefs were revised after new diligence, updating risks and probabilities
- Quarterly calibration review aligning passes with external benchmarks and recalibrating thresholds
- Annual portfolio review comparing thesis vs realized outcomes to measure learning rate trends
- Major diligence sprint documenting evidence shifts and updating the decision log template