Get the FREE Ultimate OpenClaw Setup Guide →

review-skill-improver

npx machina-cli add skill existential-birds/beagle/review-skill-improver --openclaw
Files (1)
SKILL.md
5.1 KB

Review Skill Improver

Purpose

Analyzes structured feedback logs to:

  1. Identify rules that produce false positives (high REJECT rate)
  2. Identify missing rules (issues that should have been caught)
  3. Suggest specific skill modifications

Input

Feedback log in enhanced schema format (see review-feedback-schema skill).

Analysis Process

Step 1: Aggregate by Rule Source

For each unique rule_source:
  - Count total issues flagged
  - Count ACCEPT vs REJECT
  - Calculate rejection rate
  - Extract rejection rationales

Step 2: Identify High-Rejection Rules

Rules with >30% rejection rate warrant investigation:

  • Read the rejection rationales
  • Identify common themes
  • Determine if rule needs refinement or exception

Step 3: Pattern Analysis

Group rejections by rationale theme:

  • "Linter already handles this" -> Add linter verification step
  • "Framework supports this pattern" -> Add exception to skill
  • "Intentional design decision" -> Add codebase context check
  • "Wrong code path assumed" -> Add code tracing step

Step 4: Generate Improvement Recommendations

For each identified issue, produce:

## Recommendation: [SHORT_TITLE]

**Affected Skill:** `skill-name/SKILL.md` or `skill-name/references/file.md`

**Problem:** [What's causing false positives]

**Evidence:**
- [X] rejections with rationale "[common theme]"
- Example: [file:line] - [issue] - [rationale]

**Proposed Fix:**
```markdown
[Exact text to add/modify in the skill]

Expected Impact: Reduce false positive rate for [rule] from X% to Y%


## Output Format

```markdown
# Review Skill Improvement Report

## Summary
- Feedback entries analyzed: [N]
- Unique rules triggered: [N]
- High-rejection rules identified: [N]
- Recommendations generated: [N]

## High-Rejection Rules

| Rule Source | Total | Rejected | Rate | Theme |
|-------------|-------|----------|------|-------|
| ... | ... | ... | ... | ... |

## Recommendations

[Numbered list of recommendations in format above]

## Rules Performing Well

[Rules with <10% rejection rate - preserve these]

Usage

# Analyze feedback and generate improvement report
/review-skill-improver --output improvement-report.md

Example Analysis

Given this feedback data:

rule_source,verdict,rationale
python-code-review:line-length,REJECT,ruff check passes
python-code-review:line-length,REJECT,no E501 violation
python-code-review:line-length,REJECT,linter config allows 120
python-code-review:line-length,ACCEPT,fixed long line
pydantic-ai-common-pitfalls:tool-decorator,REJECT,docs support raw functions
python-code-review:type-safety,ACCEPT,added type annotation
python-code-review:type-safety,ACCEPT,fixed Any usage

Analysis output:

# Review Skill Improvement Report

## Summary
- Feedback entries analyzed: 7
- Unique rules triggered: 3
- High-rejection rules identified: 2
- Recommendations generated: 2

## High-Rejection Rules

| Rule Source | Total | Rejected | Rate | Theme |
|-------------|-------|----------|------|-------|
| python-code-review:line-length | 4 | 3 | 75% | linter handles this |
| pydantic-ai-common-pitfalls:tool-decorator | 1 | 1 | 100% | framework supports pattern |

## Recommendations

### 1. Add Linter Verification for Line Length

**Affected Skill:** `commands/review-python.md`

**Problem:** Flagging line length issues that linters confirm don't exist

**Evidence:**
- 3 rejections with rationale "linter passes/handles this"
- Example: amelia/drivers/api/openai.py:102 - Line too long - ruff check passes

**Proposed Fix:**
Add step to run `ruff check` before manual review. If linter passes for line length, do not flag manually.

**Expected Impact:** Reduce false positive rate for line-length from 75% to <10%

### 2. Add Raw Function Tool Registration Exception

**Affected Skill:** `skills/pydantic-ai-common-pitfalls/SKILL.md`

**Problem:** Flagging valid pydantic-ai pattern as error

**Evidence:**
- 1 rejection with rationale "docs support raw functions"

**Proposed Fix:**
Add "Valid Patterns" section documenting that passing functions with RunContext to Agent(tools=[...]) is valid.

**Expected Impact:** Eliminate false positives for this pattern

## Rules Performing Well

| Rule Source | Total | Accepted | Rate |
|-------------|-------|----------|------|
| python-code-review:type-safety | 2 | 2 | 100% |

Future: Automated Skill Updates

Once confidence is high, this skill can:

  1. Generate PRs to beagle with skill improvements
  2. Track improvement impact over time
  3. A/B test rule variations

Feedback Loop

Review Code -> Log Outcomes -> Analyze Patterns -> Improve Skills -> Better Reviews
     ^                                                                    |
     +--------------------------------------------------------------------+

This creates a continuous improvement cycle where review quality improves based on empirical data rather than guesswork.

Source

git clone https://github.com/existential-birds/beagle/blob/main/plugins/beagle-core/skills/review-skill-improver/SKILL.mdView on GitHub

Overview

Review Skill Improver analyzes structured feedback logs to identify high-rejection rules, missing checks, and concrete skill changes. It guides teams to improve review accuracy by surfacing patterns and actionable recommendations.

How This Skill Works

It aggregates feedback by rule_source, computes rejection rates, and flags rules with rejection rates exceeding 30%. It then clusters rejections by rationale themes and generates targeted improvement recommendations with exact edits to apply.

When to Use It

  • You have accumulated feedback data showing false positives that need reduction
  • You need to refine rules that over-reject or generate unnecessary flags
  • You seek to surface missing rules or patterns that should be caught
  • You want structured guidance to modify skills and deployment with traceable evidence
  • You are preparing an improvement report and need concrete, testable edits

Quick Start

  1. Step 1: Run /review-skill-improver to analyze your feedback logs and generate an improvement report
  2. Step 2: Inspect high-rejection rules and their themes to identify patterns
  3. Step 3: Apply the Proposed Fixes to the relevant skill and validate with a new feedback batch

Best Practices

  • Use enhanced feedback schema logs as input
  • Prioritize rules with >30% rejection rate
  • Review rejection rationales for consistent themes
  • Test proposed edits in a controlled feedback batch before rollout
  • Document changes in the skill file and maintain changelogs

Example Use Cases

  • python-code-review:line-length showing high rejection; add linter verification step
  • pydantic-ai-common-pitfalls:tool-decorator rejected due to docs support for raw functions
  • python-code-review:type-safety accepted, used to inform pattern analysis
  • framework-patterns:resource-usage flagged; add exception to skill when framework supports pattern
  • improvement report generated with example analysis (line-length, tool-decorator, type-safety)

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers