evaluating
npx machina-cli add skill chris-hendrix/claudehub/evaluating --openclawEvaluating Artifacts
Assess output quality across contextually relevant dimensions, with concrete paths to excellence.
Principles
- Context-driven dimensions - Select 5-7 dimensions based on the artifact type and purpose. No fixed rubric.
- Target excellence - 10/10 is the standard. Anything below 9 requires specific, actionable upgrades.
- Parallel evaluation - Assess each dimension independently to avoid bias bleeding between scores.
- Surgical revision - When upgrading, preserve voice and structure. Only fix what's weak.
Dimension Selection
Propose dimensions based on what matters for this specific artifact. Consider:
- Artifact type (code, documentation, design, communication)
- Intended audience
- Success criteria
- Common failure modes
Get user confirmation before evaluating. They may have context on what matters most.
Evaluation Format
For each dimension:
[Dimension Name] — X/10
Why this dimension: 1-2 sentences on relevance What's working: Specific strengths observed What's missing: Specific gaps identified Upgrade to 10/10: Concrete actions to close gaps
Revision Guidelines
When creating a 10/10 version:
- Preserve original structure, voice, and core content
- Only upgrade what's necessary for weak dimensions
- Default to surgical edits, not rewrites
- Show changes clearly so the user can learn
Source
git clone https://github.com/chris-hendrix/claudehub/blob/main/plugins/rpi/skills/evaluating/SKILL.mdView on GitHub Overview
Evaluating artifacts uses multi-dimensional scoring to guide quality improvements. It provides a framework for dimension selection, parallel evaluation, and concrete upgrades to reach a 10/10 result while preserving voice and structure.
How This Skill Works
Dimension Selection: propose 5-7 context-driven dimensions based on artifact type, audience, success criteria, and common failure modes; obtain user confirmation before evaluating. Parallel Evaluation: score each dimension independently on a 0-10 scale, documenting what is working and what is missing. Upgrade Path and Revision: draft concrete upgrades to close gaps to reach 10/10, performing surgical revisions that preserve voice and structure.
When to Use It
- Assess artifacts like code, documentation, design, or communication to identify quality gaps across dimensions
- During pre-release or review cycles where a 10/10 standard is required
- Quality improvement tasks where bias must be avoided by evaluating dimensions in parallel
- Artifacts with known failure modes that need targeted upgrades
- Cross-functional artifacts where you want a transparent, learnable revision process
Quick Start
- Step 1: Collect the artifact and confirm context with the user to tailor dimensions
- Step 2: Propose 5-7 dimensions and score them in parallel (X/10) with notes for each
- Step 3: Write upgrade actions to reach 10/10 and perform a surgical revision preserving voice
Best Practices
- Limit dimensions to 5-7 and tailor to artifact type
- Score each dimension independently to avoid bias bleed
- Document what is working and what is missing for each dimension
- Preserve voice and structure during upgrades (surgical revision)
- Show changes clearly and include a revision log to teach how to reach 10/10
Example Use Cases
- Evaluating a code module for readability, efficiency, error handling, and maintainability
- Auditing API documentation for clarity, completeness, and discoverability
- Reviewing a design mockup for usability, accessibility, responsiveness, and branding
- Assessing a research report for methodological rigor, transparency, and reproducibility
- Polishing a marketing landing page for clarity, tone, conversion, and accessibility