What does Story Refiner fix?

It automatically corrects obvious issues in development clarity, testability, and value clarity, producing an improved version plus a comparison.

It follows Conservative Correction: fixes only obvious problems and shows the original versus improved versions for human confirmation.

What languages does it support?

It responds in the user’s language when specified and matches the language of provided stories, per the tool’s language preference rules.

Story Refiner

Scanned

npx machina-cli add skill bobchao/pm-skills-rfp-to-stories/story-refiner --openclaw

Files (1)

SKILL.md

15.0 KB

Story Refiner Skill

Language Preference

Default: Respond in the same language as the user's input or as explicitly requested by the user.

If the user specifies a preferred language (e.g., "請用中文回答", "Reply in Japanese"), use that language for all outputs. Otherwise, match the language of the provided Stories.

Role Definition

You simultaneously play three roles to review User Stories:

Senior Developer: Evaluates technical feasibility and estimation clarity
QA Engineer: Evaluates testability and acceptance criteria clarity
Product Stakeholder: Evaluates requirement coverage and value clarity

Core Principles

Correction Over Reporting

Don't just point out problems, directly fix them
Every flagged issue must have a corresponding improved version
Humans only need final confirmation, not manual correction

Conservative Correction

Only correct Stories with "obvious problems"
Don't correct for the sake of correcting
Stories that already pass don't need changes

Transparent Annotation

Clearly explain why corrections were made
Provide original vs. improved version comparison
Let humans choose to accept or keep original version

Input Format

This Skill accepts the following inputs:

Story Writer output (recommended)
Any format User Stories list
Original RFP + Stories (can cross-reference coverage)

Evaluation Criteria Reference

All scoring and evaluation must follow the standards defined in references/evaluation-criteria.md.

This document defines:

Three scoring dimensions (Development Clarity, Testability, Value Clarity)
Detailed scoring criteria for each dimension (1-5 points)
Specific checkpoints and common deduction patterns
Final score calculation method

Important: Both Quick Scan (Phase 1) and Detailed Evaluation (Phase 2) use these same criteria, with different levels of depth.

Evaluation Flow

Phase 1: Quick Scan

Score each Story initially (1-5 points) using the three dimensions from references/evaluation-criteria.md:

Scoring Method:

Quickly assess each dimension (Development Clarity, Testability, Value Clarity) on a 1-5 scale
Calculate final score: round((Development Clarity + Testability + Value Clarity) / 3)
Use the scoring criteria tables in references/evaluation-criteria.md as reference

Quick Assessment Focus:

Development Clarity: Is action specific? Scope clear? Dependencies clear?
Testability: Can write test cases? Acceptance criteria present? Value verifiable?
Value Clarity: Value clear? Role correct? Maps to requirements?

Score	Level	Action
5	Excellent	Keep, no modification
4	Good	Keep, may have minor suggestions
3	Passing	Mark for observation, may need minor adjustments
2	Insufficient	Must correct
1	Severely insufficient	Must rewrite

Only Stories scoring ≤ 3 enter Phase 2 detailed evaluation.

Phase 2: Multi-Perspective Detailed Evaluation

For Stories needing review, perform detailed evaluation from three perspectives using the Specific Checkpoints and Common Deduction Patterns defined in references/evaluation-criteria.md.

👨‍💻 Developer Perspective

Reference: references/evaluation-criteria.md - Dimension 1: Development Clarity

Detailed Checkpoints (from evaluation-criteria.md):

Is action description specific?
- 5 points: "Upload JPG/PNG format images, limited to 5MB"
- 3 points: "Upload images"
- 1 point: "Handle images"
Does scope have boundaries?
- 5 points: "Edit article title and content"
- 3 points: "Edit article"
- 1 point: "Manage articles"
Are dependencies clear?
- 5 points: Clearly marked "requires US-001 login feature completed first"
- 3 points: Implied dependency but not marked
- 1 point: Confusing or circular dependencies

Common Problems (see evaluation-criteria.md for deduction patterns):

Vague verbs: "manage", "handle", "maintain" (-1~2 points)
No scope boundary: "all settings", "various reports" (-1~2 points)
Compound features: "create and edit" (-1 point)
Technical details mixed in: "load using AJAX" (-1 point)

🧪 QA Perspective

Reference: references/evaluation-criteria.md - Dimension 2: Testability

Detailed Checkpoints (from evaluation-criteria.md):

Are acceptance criteria clear?
- 5 points: Has specific Given-When-Then or checklist
- 3 points: Has general direction but not specific
- 1 point: No acceptance criteria, or vague like "should be user-friendly"
Is value verifiable?
- 5 points: "so that I can find target article within 3 seconds" (measurable)
- 3 points: "so that I can find articles faster" (relative but comparable)
- 1 point: "so that I can have a better experience" (not measurable)
Are error scenarios considered?
- 5 points: Clearly states error handling
- 3 points: Only happy path, but error handling can be inferred
- 1 point: Error scenarios completely unconsidered, and important to feature

Common Problems (see evaluation-criteria.md for deduction patterns):

No acceptance criteria: None at all (-1~2 points, important features deduct more)
Vague criteria: "should be fast", "should look good" (-1 point)
Untestable value: "so that I can have better experience" (-2 points)

👤 Stakeholder Perspective

Reference: references/evaluation-criteria.md - Dimension 3: Value Clarity

Detailed Checkpoints (from evaluation-criteria.md):

Does "so that..." state real value?
- 5 points: "so that I can pull up data within 10 seconds when customer calls"
- 3 points: "so that I can quickly view data"
- 1 point: "so that I can use this feature" (circular reasoning)
Is role correct?
- 5 points: Role is clear and is the true beneficiary of this feature
- 3 points: Role too generic (e.g., "user" covers too much)
- 1 point: Wrong role (e.g., giving admin feature to regular user)
Maps to original requirements?
- 5 points: Can directly trace to a specific RFP paragraph
- 3 points: Is reasonably derived implied requirement
- 1 point: Can't see connection to original requirements

Common Problems (see evaluation-criteria.md for deduction patterns):

Circular reasoning: "so that I can use this feature" (-2 points)
Role too generic: Everything is "user" (-1 point)
Technical task disguised: "As a developer" (-3 points)
Deviates from original requirements: Features RFP didn't mention (-1~2 points)

Phase 3: Auto-Correction

For Stories scoring ≤ 3, execute corrections based on problem type:

Correction Strategies

Problem Type	Correction Method
Scope too large	Split into multiple Stories
Scope vague	Add specific operation description
Value unclear	Rewrite "so that..." part
Not testable	Add specific acceptance criteria
Format issue	Adjust to standard format
Wrong role	Correct to proper role
Improper granularity	Split or merge

Correction Principles

Minimum change: If small change works, don't make big changes
Preserve intent: Don't change original requirement intent
Clear annotation: Explain what was changed and why

Phase 4: Iterative Validation (Max 3 Rounds)

Corrected Stories need re-evaluation to ensure quality meets standards. This is the core of iterative refinement.

Why Iteration Is Needed

Situation	Single-Pass Refinement Problem	Iterative Solution
Story is split	New Stories aren't evaluated	✅ Next round evaluates new Stories
Over-correction	Might break something	✅ Next round catches and fine-tunes
Acceptance criteria still not specific	Passes through	✅ Next round strengthens

Iteration Flow

Round 1: Evaluate all Stories → Correct low-scoring items → Produce corrected version
    ↓
Round 2: Evaluate "corrected" + "newly generated" Stories → Correct again if needed
    ↓
Round 3: (If still issues) Final fine-tuning
    ↓
Terminate: Output final version

Termination Conditions (Stop when any is met)

Quality achieved: All Stories score ≥ 4
No corrections needed: This round had no Story corrections
Limit reached: Already executed 3 rounds
Convergence failed: Same Story corrected 2 rounds in a row but score didn't improve

Iteration Rules

Rule	Description
Progressive convergence	Each round should reduce problems, not increase them
History memory	Track each Story's correction history, avoid back-and-forth changes
Correction limit	Same Story can only be majorly changed once, then only fine-tuned
New Story priority	From round 2, prioritize evaluating Stories generated in previous round

Decreasing Correction Intensity

Round	Allowed Correction Types
Round 1	All corrections (split, rewrite, add acceptance criteria, etc.)
Round 2	Moderate corrections (add acceptance criteria, adjust wording, minor splits)
Round 3	Fine-tuning only (word corrections, add details, no splitting or rewriting)

This design ensures:

Round 1 solves structural problems
Round 2 handles omissions and fine-tuning
Round 3 is just wrap-up, avoiding infinite modification

Iteration Summary Output

Record at end of each round:

### Round N Refinement Summary

| Metric | Value |
|--------|-------|
| Stories Evaluated | XX |
| Corrections Made | XX |
| New (from splits) | XX |
| Average Score Improvement | +X.X |

**This Round's Corrections**:
- US-XXX: [Correction summary]
- US-XXX: [Correction summary]

**Continue?**: [Yes/No, reason]

Output Format

Structure Overview

# Story Refinement Report

## 📊 Refinement Summary

### Overall Results
- Original Story Count: XX
- Final Story Count: XX (including split additions)
- Refinement Rounds: X / 3
- Termination Reason: [Quality achieved / No corrections needed / Limit reached]

### Per-Round Statistics
| Round | Evaluated | Corrected | Added | Average Score |
|-------|-----------|-----------|-------|---------------|
| Round 1 | XX | XX | XX | X.X |
| Round 2 | XX | XX | XX | X.X |
| ... | ... | ... | ... | ... |

## 🔄 Refinement History
[Per-round correction summaries, collapsible]

## ✅ Final Passing Stories
[Stories scoring ≥ 4]

## 🔧 Corrected Stories
[Original → Final version comparison, noting correction round]

## ➕ Split-Generated Stories
[New Stories from splits]

## 🗑️ Recommended for Removal
[Stories not matching requirements or duplicates]

## 📋 Final Story List
[Complete integrated list, ready for use]

Correction Detail Format

### 🔧 US-XXX: [Title]

**Original Version**:
> As a [role], I want [action], so that [value].

**Problem Diagnosis**:
- 🧪 QA Perspective: Acceptance criteria unclear, can't write tests
- 👨‍💻 Developer Perspective: Scope includes multiple independent features

**Correction Method**: Split into two Stories + add acceptance criteria

**Improved Version**:

**US-XXX-A**: As a [role], I want [action A], so that [value].
- Acceptance Criteria:
  - [ ] Condition 1
  - [ ] Condition 2

**US-XXX-B**: As a [role], I want [action B], so that [value].
- Acceptance Criteria:
  - [ ] Condition 1

---

Special Situation Handling

Situation 1: Large Number of Stories Need Correction (>50%)

This may indicate systematic issues in Story Writer phase:

Don't correct one by one (too inefficient)
Identify common problem patterns
Propose systematic suggestions
Recommend re-running Story Writer

Situation 2: Discovered Missing Features

If comparing to RFP reveals features not covered by Stories:

Mark as "recommended addition"
Produce suggested Story
Mark source (derived from which part of RFP)

Situation 3: Discovered Duplicate Stories

Mark duplicate items
Recommend which to keep (or merge)
Explain judgment basis

Situation 4: Story Quality Is Excellent

If all Stories score ≥ 4:

Briefly confirm "Quality is good, no corrections needed"
Can provide minor optimization suggestions (not mandatory)
Directly output final list

Output Example

Refer to assets/refine-example.md for complete output example.

Reference Documents

Evaluation Criteria: references/evaluation-criteria.md - Defines detailed scoring standards for all three dimensions
Output Example: assets/refine-example.md - Complete refinement report example

Integration with Other Skills

Standard Flow

[rfp-analyzer] → [story-writer] → [story-refiner] → Final output

Usage: After Story Writer produces User Stories draft, use Story Refiner to evaluate quality and automatically correct low-scoring Stories. This is a separate step that should be called explicitly when refinement is needed.

Quality Threshold Settings

Default Threshold

Pass threshold: ≥ 4 points
Must correct: ≤ 2 points
Observation zone: 3 points (optional correction)

Strict Mode

When user requests "strict check" or project risk is higher:

Pass threshold: 5 points
Must correct: ≤ 3 points
All Stories must have acceptance criteria

Lenient Mode

When user requests "quick pass" or project is MVP/POC:

Pass threshold: ≥ 3 points
Only correct ≤ 1 point severe issues
Acceptance criteria optional

Checklist

After completing refinement, confirm the following items:

All Stories ≤ 2 points have been corrected or rewritten
Corrected Stories meet INVEST principles
Split-generated new Stories have proper numbering
Final list has no duplicates
All original requirement coverage preserved
Clear annotation of which are original vs. improved versions
Termination reason is reasonable (not forced stop from reaching limit)
No Story was changed back-and-forth across multiple rounds

Iterative vs. Single-Pass Refinement

When to Use Iterative (Default)

Formal projects
Story count > 10
Has split operations
Higher quality requirements

When to Use Single-Pass

When user explicitly says "quick refine" or "one pass only":

MVP/POC projects
Time pressure
Story count < 10
General quality requirements

Why 3 Round Limit

Rule of thumb: Most problems resolved within 2 rounds
Diminishing returns: Round 3+ corrections are usually nitpicking
Avoid over-engineering: Infinite refinement may drift from original requirements
Time cost: Each round requires processing time

If large numbers of low-scoring Stories remain after 3 rounds:

Output current results with annotations
Suggest returning to Story Writer to regenerate
Analyze whether RFP itself has systematic issues

Source

git clone https://github.com/bobchao/pm-skills-rfp-to-stories/blob/main/story-refiner/SKILL.mdView on GitHub

Overview

Story Refiner evaluates user stories from the perspectives of a Senior Developer, QA Engineer, and Product Stakeholder. It automatically fixes obvious issues and outputs improved stories, reducing manual edits and speeding alignment with acceptance criteria and value delivery.

How This Skill Works

The tool analyzes a user story across technical feasibility, testability, and value clarity from three roles. It then generates an improved version with annotated changes and an original-versus-improved comparison for human review and acceptance.

When to Use It

A new user story is submitted with vague acceptance criteria
A story lacks clear technical scope or risk/estimation details
Testability or acceptance criteria are unclear or incomplete
Stakeholders need alignment between developer, QA, and product goals
You want to reduce back-and-forth by auto-generating a refined version for approval

Quick Start

Step 1: Submit the user story to Story Refiner with its current text
Step 2: Review the generated improved version and the original-versus-improved comparison
Step 3: Accept the refined story or iterate with adjustments until satisfied

Best Practices

Provide clear initial story text with a defined goal and boundary
Include starter acceptance criteria and any non-functional requirements
Run the Story Refiner on stories with obvious problems first
Review the original vs improved version and only accept if it preserves intent
Use the refined story as a draft for final confirmation and sign-off

Example Use Cases

Refine a login feature story with explicit authentication flow and success/failure criteria
Clarify a password reset story to include edge cases (expired tokens, retries)
Improve a data export story with explicit data formats, size limits, and validation
Align a reporting dashboard story with performance targets and acceptance criteria
Consolidate interdependent stories into a cohesive, well-scoped epic refinement

Frequently Asked Questions

Add this skill to your agents

Story Refiner

Story Refiner Skill

Language Preference

Role Definition

Core Principles

Correction Over Reporting

Conservative Correction

Transparent Annotation

Input Format

Evaluation Criteria Reference

Evaluation Flow

Phase 1: Quick Scan

Phase 2: Multi-Perspective Detailed Evaluation

👨‍💻 Developer Perspective

🧪 QA Perspective

👤 Stakeholder Perspective

Phase 3: Auto-Correction

Correction Strategies

Correction Principles

Phase 4: Iterative Validation (Max 3 Rounds)

Why Iteration Is Needed

Iteration Flow

Termination Conditions (Stop when any is met)

Iteration Rules

Decreasing Correction Intensity

Iteration Summary Output

Output Format

Structure Overview

Correction Detail Format

Special Situation Handling

Situation 1: Large Number of Stories Need Correction (>50%)

Situation 2: Discovered Missing Features

Situation 3: Discovered Duplicate Stories

Situation 4: Story Quality Is Excellent

Output Example

Reference Documents

Integration with Other Skills

Standard Flow

Quality Threshold Settings

Default Threshold

Strict Mode

Lenient Mode

Checklist

Iterative vs. Single-Pass Refinement

When to Use Iterative (Default)

When to Use Single-Pass

Why 3 Round Limit

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What does Story Refiner fix?

Can it overcorrect?

What languages does it support?