What does 10/10 mean?

The standard of excellence for the artifact in context; anything below 9 requires concrete upgrades.

What is surgical revision?

Upgrades preserve the original voice and structure; only fix what is weak, with changes shown clearly to illustrate learning.

evaluating

npx machina-cli add skill chris-hendrix/claudehub/evaluating --openclaw

Files (1)

SKILL.md

1.6 KB

Evaluating Artifacts

Assess output quality across contextually relevant dimensions, with concrete paths to excellence.

Principles

Context-driven dimensions - Select 5-7 dimensions based on the artifact type and purpose. No fixed rubric.
Target excellence - 10/10 is the standard. Anything below 9 requires specific, actionable upgrades.
Parallel evaluation - Assess each dimension independently to avoid bias bleeding between scores.
Surgical revision - When upgrading, preserve voice and structure. Only fix what's weak.

Dimension Selection

Propose dimensions based on what matters for this specific artifact. Consider:

Artifact type (code, documentation, design, communication)
Intended audience
Success criteria
Common failure modes

Get user confirmation before evaluating. They may have context on what matters most.

Evaluation Format

For each dimension:

[Dimension Name] — X/10

Why this dimension: 1-2 sentences on relevance What's working: Specific strengths observed What's missing: Specific gaps identified Upgrade to 10/10: Concrete actions to close gaps

Revision Guidelines

When creating a 10/10 version:

Preserve original structure, voice, and core content
Only upgrade what's necessary for weak dimensions
Default to surgical edits, not rewrites
Show changes clearly so the user can learn

Source

git clone https://github.com/chris-hendrix/claudehub/blob/main/plugins/rpi/skills/evaluating/SKILL.mdView on GitHub

Overview

Evaluating artifacts uses multi-dimensional scoring to guide quality improvements. It provides a framework for dimension selection, parallel evaluation, and concrete upgrades to reach a 10/10 result while preserving voice and structure.

How This Skill Works

Dimension Selection: propose 5-7 context-driven dimensions based on artifact type, audience, success criteria, and common failure modes; obtain user confirmation before evaluating. Parallel Evaluation: score each dimension independently on a 0-10 scale, documenting what is working and what is missing. Upgrade Path and Revision: draft concrete upgrades to close gaps to reach 10/10, performing surgical revisions that preserve voice and structure.

When to Use It

Assess artifacts like code, documentation, design, or communication to identify quality gaps across dimensions
During pre-release or review cycles where a 10/10 standard is required
Quality improvement tasks where bias must be avoided by evaluating dimensions in parallel
Artifacts with known failure modes that need targeted upgrades
Cross-functional artifacts where you want a transparent, learnable revision process

Quick Start

Step 1: Collect the artifact and confirm context with the user to tailor dimensions
Step 2: Propose 5-7 dimensions and score them in parallel (X/10) with notes for each
Step 3: Write upgrade actions to reach 10/10 and perform a surgical revision preserving voice

Best Practices

Limit dimensions to 5-7 and tailor to artifact type
Score each dimension independently to avoid bias bleed
Document what is working and what is missing for each dimension
Preserve voice and structure during upgrades (surgical revision)
Show changes clearly and include a revision log to teach how to reach 10/10

Example Use Cases

Evaluating a code module for readability, efficiency, error handling, and maintainability
Auditing API documentation for clarity, completeness, and discoverability
Reviewing a design mockup for usability, accessibility, responsiveness, and branding
Assessing a research report for methodological rigor, transparency, and reproducibility
Polishing a marketing landing page for clarity, tone, conversion, and accessibility

Frequently Asked Questions

Add this skill to your agents