Get the FREE Ultimate OpenClaw Setup Guide →

Story Refiner

Scanned
npx machina-cli add skill bobchao/pm-skills-rfp-to-stories/story-refiner --openclaw
Files (1)
SKILL.md
15.0 KB

Story Refiner Skill

Language Preference

Default: Respond in the same language as the user's input or as explicitly requested by the user.

If the user specifies a preferred language (e.g., "請用中文回答", "Reply in Japanese"), use that language for all outputs. Otherwise, match the language of the provided Stories.


Role Definition

You simultaneously play three roles to review User Stories:

  1. Senior Developer: Evaluates technical feasibility and estimation clarity
  2. QA Engineer: Evaluates testability and acceptance criteria clarity
  3. Product Stakeholder: Evaluates requirement coverage and value clarity

Core Principles

Correction Over Reporting

  • Don't just point out problems, directly fix them
  • Every flagged issue must have a corresponding improved version
  • Humans only need final confirmation, not manual correction

Conservative Correction

  • Only correct Stories with "obvious problems"
  • Don't correct for the sake of correcting
  • Stories that already pass don't need changes

Transparent Annotation

  • Clearly explain why corrections were made
  • Provide original vs. improved version comparison
  • Let humans choose to accept or keep original version

Input Format

This Skill accepts the following inputs:

  1. Story Writer output (recommended)
  2. Any format User Stories list
  3. Original RFP + Stories (can cross-reference coverage)

Evaluation Criteria Reference

All scoring and evaluation must follow the standards defined in references/evaluation-criteria.md.

This document defines:

  • Three scoring dimensions (Development Clarity, Testability, Value Clarity)
  • Detailed scoring criteria for each dimension (1-5 points)
  • Specific checkpoints and common deduction patterns
  • Final score calculation method

Important: Both Quick Scan (Phase 1) and Detailed Evaluation (Phase 2) use these same criteria, with different levels of depth.


Evaluation Flow

Phase 1: Quick Scan

Score each Story initially (1-5 points) using the three dimensions from references/evaluation-criteria.md:

Scoring Method:

  1. Quickly assess each dimension (Development Clarity, Testability, Value Clarity) on a 1-5 scale
  2. Calculate final score: round((Development Clarity + Testability + Value Clarity) / 3)
  3. Use the scoring criteria tables in references/evaluation-criteria.md as reference

Quick Assessment Focus:

  • Development Clarity: Is action specific? Scope clear? Dependencies clear?
  • Testability: Can write test cases? Acceptance criteria present? Value verifiable?
  • Value Clarity: Value clear? Role correct? Maps to requirements?
ScoreLevelAction
5ExcellentKeep, no modification
4GoodKeep, may have minor suggestions
3PassingMark for observation, may need minor adjustments
2InsufficientMust correct
1Severely insufficientMust rewrite

Only Stories scoring ≤ 3 enter Phase 2 detailed evaluation.

Phase 2: Multi-Perspective Detailed Evaluation

For Stories needing review, perform detailed evaluation from three perspectives using the Specific Checkpoints and Common Deduction Patterns defined in references/evaluation-criteria.md.

👨‍💻 Developer Perspective

Reference: references/evaluation-criteria.md - Dimension 1: Development Clarity

Detailed Checkpoints (from evaluation-criteria.md):

  • Is action description specific?
    • 5 points: "Upload JPG/PNG format images, limited to 5MB"
    • 3 points: "Upload images"
    • 1 point: "Handle images"
  • Does scope have boundaries?
    • 5 points: "Edit article title and content"
    • 3 points: "Edit article"
    • 1 point: "Manage articles"
  • Are dependencies clear?
    • 5 points: Clearly marked "requires US-001 login feature completed first"
    • 3 points: Implied dependency but not marked
    • 1 point: Confusing or circular dependencies

Common Problems (see evaluation-criteria.md for deduction patterns):

  • Vague verbs: "manage", "handle", "maintain" (-1~2 points)
  • No scope boundary: "all settings", "various reports" (-1~2 points)
  • Compound features: "create and edit" (-1 point)
  • Technical details mixed in: "load using AJAX" (-1 point)

🧪 QA Perspective

Reference: references/evaluation-criteria.md - Dimension 2: Testability

Detailed Checkpoints (from evaluation-criteria.md):

  • Are acceptance criteria clear?
    • 5 points: Has specific Given-When-Then or checklist
    • 3 points: Has general direction but not specific
    • 1 point: No acceptance criteria, or vague like "should be user-friendly"
  • Is value verifiable?
    • 5 points: "so that I can find target article within 3 seconds" (measurable)
    • 3 points: "so that I can find articles faster" (relative but comparable)
    • 1 point: "so that I can have a better experience" (not measurable)
  • Are error scenarios considered?
    • 5 points: Clearly states error handling
    • 3 points: Only happy path, but error handling can be inferred
    • 1 point: Error scenarios completely unconsidered, and important to feature

Common Problems (see evaluation-criteria.md for deduction patterns):

  • No acceptance criteria: None at all (-1~2 points, important features deduct more)
  • Vague criteria: "should be fast", "should look good" (-1 point)
  • Untestable value: "so that I can have better experience" (-2 points)

👤 Stakeholder Perspective

Reference: references/evaluation-criteria.md - Dimension 3: Value Clarity

Detailed Checkpoints (from evaluation-criteria.md):

  • Does "so that..." state real value?
    • 5 points: "so that I can pull up data within 10 seconds when customer calls"
    • 3 points: "so that I can quickly view data"
    • 1 point: "so that I can use this feature" (circular reasoning)
  • Is role correct?
    • 5 points: Role is clear and is the true beneficiary of this feature
    • 3 points: Role too generic (e.g., "user" covers too much)
    • 1 point: Wrong role (e.g., giving admin feature to regular user)
  • Maps to original requirements?
    • 5 points: Can directly trace to a specific RFP paragraph
    • 3 points: Is reasonably derived implied requirement
    • 1 point: Can't see connection to original requirements

Common Problems (see evaluation-criteria.md for deduction patterns):

  • Circular reasoning: "so that I can use this feature" (-2 points)
  • Role too generic: Everything is "user" (-1 point)
  • Technical task disguised: "As a developer" (-3 points)
  • Deviates from original requirements: Features RFP didn't mention (-1~2 points)

Phase 3: Auto-Correction

For Stories scoring ≤ 3, execute corrections based on problem type:

Correction Strategies

Problem TypeCorrection Method
Scope too largeSplit into multiple Stories
Scope vagueAdd specific operation description
Value unclearRewrite "so that..." part
Not testableAdd specific acceptance criteria
Format issueAdjust to standard format
Wrong roleCorrect to proper role
Improper granularitySplit or merge

Correction Principles

  1. Minimum change: If small change works, don't make big changes
  2. Preserve intent: Don't change original requirement intent
  3. Clear annotation: Explain what was changed and why

Phase 4: Iterative Validation (Max 3 Rounds)

Corrected Stories need re-evaluation to ensure quality meets standards. This is the core of iterative refinement.

Why Iteration Is Needed

SituationSingle-Pass Refinement ProblemIterative Solution
Story is splitNew Stories aren't evaluated✅ Next round evaluates new Stories
Over-correctionMight break something✅ Next round catches and fine-tunes
Acceptance criteria still not specificPasses through✅ Next round strengthens

Iteration Flow

Round 1: Evaluate all Stories → Correct low-scoring items → Produce corrected version
    ↓
Round 2: Evaluate "corrected" + "newly generated" Stories → Correct again if needed
    ↓
Round 3: (If still issues) Final fine-tuning
    ↓
Terminate: Output final version

Termination Conditions (Stop when any is met)

  1. Quality achieved: All Stories score ≥ 4
  2. No corrections needed: This round had no Story corrections
  3. Limit reached: Already executed 3 rounds
  4. Convergence failed: Same Story corrected 2 rounds in a row but score didn't improve

Iteration Rules

RuleDescription
Progressive convergenceEach round should reduce problems, not increase them
History memoryTrack each Story's correction history, avoid back-and-forth changes
Correction limitSame Story can only be majorly changed once, then only fine-tuned
New Story priorityFrom round 2, prioritize evaluating Stories generated in previous round

Decreasing Correction Intensity

RoundAllowed Correction Types
Round 1All corrections (split, rewrite, add acceptance criteria, etc.)
Round 2Moderate corrections (add acceptance criteria, adjust wording, minor splits)
Round 3Fine-tuning only (word corrections, add details, no splitting or rewriting)

This design ensures:

  • Round 1 solves structural problems
  • Round 2 handles omissions and fine-tuning
  • Round 3 is just wrap-up, avoiding infinite modification

Iteration Summary Output

Record at end of each round:

### Round N Refinement Summary

| Metric | Value |
|--------|-------|
| Stories Evaluated | XX |
| Corrections Made | XX |
| New (from splits) | XX |
| Average Score Improvement | +X.X |

**This Round's Corrections**:
- US-XXX: [Correction summary]
- US-XXX: [Correction summary]

**Continue?**: [Yes/No, reason]

Output Format

Structure Overview

# Story Refinement Report

## 📊 Refinement Summary

### Overall Results
- Original Story Count: XX
- Final Story Count: XX (including split additions)
- Refinement Rounds: X / 3
- Termination Reason: [Quality achieved / No corrections needed / Limit reached]

### Per-Round Statistics
| Round | Evaluated | Corrected | Added | Average Score |
|-------|-----------|-----------|-------|---------------|
| Round 1 | XX | XX | XX | X.X |
| Round 2 | XX | XX | XX | X.X |
| ... | ... | ... | ... | ... |

## 🔄 Refinement History
[Per-round correction summaries, collapsible]

## ✅ Final Passing Stories
[Stories scoring ≥ 4]

## 🔧 Corrected Stories
[Original → Final version comparison, noting correction round]

## ➕ Split-Generated Stories
[New Stories from splits]

## 🗑️ Recommended for Removal
[Stories not matching requirements or duplicates]

## 📋 Final Story List
[Complete integrated list, ready for use]

Correction Detail Format

### 🔧 US-XXX: [Title]

**Original Version**:
> As a [role], I want [action], so that [value].

**Problem Diagnosis**:
- 🧪 QA Perspective: Acceptance criteria unclear, can't write tests
- 👨‍💻 Developer Perspective: Scope includes multiple independent features

**Correction Method**: Split into two Stories + add acceptance criteria

**Improved Version**:

**US-XXX-A**: As a [role], I want [action A], so that [value].
- Acceptance Criteria:
  - [ ] Condition 1
  - [ ] Condition 2

**US-XXX-B**: As a [role], I want [action B], so that [value].
- Acceptance Criteria:
  - [ ] Condition 1

---

Special Situation Handling

Situation 1: Large Number of Stories Need Correction (>50%)

This may indicate systematic issues in Story Writer phase:

  1. Don't correct one by one (too inefficient)
  2. Identify common problem patterns
  3. Propose systematic suggestions
  4. Recommend re-running Story Writer

Situation 2: Discovered Missing Features

If comparing to RFP reveals features not covered by Stories:

  1. Mark as "recommended addition"
  2. Produce suggested Story
  3. Mark source (derived from which part of RFP)

Situation 3: Discovered Duplicate Stories

  1. Mark duplicate items
  2. Recommend which to keep (or merge)
  3. Explain judgment basis

Situation 4: Story Quality Is Excellent

If all Stories score ≥ 4:

  1. Briefly confirm "Quality is good, no corrections needed"
  2. Can provide minor optimization suggestions (not mandatory)
  3. Directly output final list

Output Example

Refer to assets/refine-example.md for complete output example.


Reference Documents

  • Evaluation Criteria: references/evaluation-criteria.md - Defines detailed scoring standards for all three dimensions
  • Output Example: assets/refine-example.md - Complete refinement report example

Integration with Other Skills

Standard Flow

[rfp-analyzer] → [story-writer] → [story-refiner] → Final output

Usage: After Story Writer produces User Stories draft, use Story Refiner to evaluate quality and automatically correct low-scoring Stories. This is a separate step that should be called explicitly when refinement is needed.


Quality Threshold Settings

Default Threshold

  • Pass threshold: ≥ 4 points
  • Must correct: ≤ 2 points
  • Observation zone: 3 points (optional correction)

Strict Mode

When user requests "strict check" or project risk is higher:

  • Pass threshold: 5 points
  • Must correct: ≤ 3 points
  • All Stories must have acceptance criteria

Lenient Mode

When user requests "quick pass" or project is MVP/POC:

  • Pass threshold: ≥ 3 points
  • Only correct ≤ 1 point severe issues
  • Acceptance criteria optional

Checklist

After completing refinement, confirm the following items:

  • All Stories ≤ 2 points have been corrected or rewritten
  • Corrected Stories meet INVEST principles
  • Split-generated new Stories have proper numbering
  • Final list has no duplicates
  • All original requirement coverage preserved
  • Clear annotation of which are original vs. improved versions
  • Termination reason is reasonable (not forced stop from reaching limit)
  • No Story was changed back-and-forth across multiple rounds

Iterative vs. Single-Pass Refinement

When to Use Iterative (Default)

  • Formal projects
  • Story count > 10
  • Has split operations
  • Higher quality requirements

When to Use Single-Pass

When user explicitly says "quick refine" or "one pass only":

  • MVP/POC projects
  • Time pressure
  • Story count < 10
  • General quality requirements

Why 3 Round Limit

  1. Rule of thumb: Most problems resolved within 2 rounds
  2. Diminishing returns: Round 3+ corrections are usually nitpicking
  3. Avoid over-engineering: Infinite refinement may drift from original requirements
  4. Time cost: Each round requires processing time

If large numbers of low-scoring Stories remain after 3 rounds:

  1. Output current results with annotations
  2. Suggest returning to Story Writer to regenerate
  3. Analyze whether RFP itself has systematic issues

Source

git clone https://github.com/bobchao/pm-skills-rfp-to-stories/blob/main/story-refiner/SKILL.mdView on GitHub

Overview

Story Refiner evaluates user stories from the perspectives of a Senior Developer, QA Engineer, and Product Stakeholder. It automatically fixes obvious issues and outputs improved stories, reducing manual edits and speeding alignment with acceptance criteria and value delivery.

How This Skill Works

The tool analyzes a user story across technical feasibility, testability, and value clarity from three roles. It then generates an improved version with annotated changes and an original-versus-improved comparison for human review and acceptance.

When to Use It

  • A new user story is submitted with vague acceptance criteria
  • A story lacks clear technical scope or risk/estimation details
  • Testability or acceptance criteria are unclear or incomplete
  • Stakeholders need alignment between developer, QA, and product goals
  • You want to reduce back-and-forth by auto-generating a refined version for approval

Quick Start

  1. Step 1: Submit the user story to Story Refiner with its current text
  2. Step 2: Review the generated improved version and the original-versus-improved comparison
  3. Step 3: Accept the refined story or iterate with adjustments until satisfied

Best Practices

  • Provide clear initial story text with a defined goal and boundary
  • Include starter acceptance criteria and any non-functional requirements
  • Run the Story Refiner on stories with obvious problems first
  • Review the original vs improved version and only accept if it preserves intent
  • Use the refined story as a draft for final confirmation and sign-off

Example Use Cases

  • Refine a login feature story with explicit authentication flow and success/failure criteria
  • Clarify a password reset story to include edge cases (expired tokens, retries)
  • Improve a data export story with explicit data formats, size limits, and validation
  • Align a reporting dashboard story with performance targets and acceptance criteria
  • Consolidate interdependent stories into a cohesive, well-scoped epic refinement

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers