Get the FREE Ultimate OpenClaw Setup Guide →

evaluating

npx machina-cli add skill chris-hendrix/claudehub/evaluating --openclaw
Files (1)
SKILL.md
1.6 KB

Evaluating Artifacts

Assess output quality across contextually relevant dimensions, with concrete paths to excellence.

Principles

  • Context-driven dimensions - Select 5-7 dimensions based on the artifact type and purpose. No fixed rubric.
  • Target excellence - 10/10 is the standard. Anything below 9 requires specific, actionable upgrades.
  • Parallel evaluation - Assess each dimension independently to avoid bias bleeding between scores.
  • Surgical revision - When upgrading, preserve voice and structure. Only fix what's weak.

Dimension Selection

Propose dimensions based on what matters for this specific artifact. Consider:

  • Artifact type (code, documentation, design, communication)
  • Intended audience
  • Success criteria
  • Common failure modes

Get user confirmation before evaluating. They may have context on what matters most.

Evaluation Format

For each dimension:

[Dimension Name] — X/10

Why this dimension: 1-2 sentences on relevance What's working: Specific strengths observed What's missing: Specific gaps identified Upgrade to 10/10: Concrete actions to close gaps

Revision Guidelines

When creating a 10/10 version:

  • Preserve original structure, voice, and core content
  • Only upgrade what's necessary for weak dimensions
  • Default to surgical edits, not rewrites
  • Show changes clearly so the user can learn

Source

git clone https://github.com/chris-hendrix/claudehub/blob/main/plugins/rpi/skills/evaluating/SKILL.mdView on GitHub

Overview

Evaluating artifacts uses multi-dimensional scoring to guide quality improvements. It provides a framework for dimension selection, parallel evaluation, and concrete upgrades to reach a 10/10 result while preserving voice and structure.

How This Skill Works

Dimension Selection: propose 5-7 context-driven dimensions based on artifact type, audience, success criteria, and common failure modes; obtain user confirmation before evaluating. Parallel Evaluation: score each dimension independently on a 0-10 scale, documenting what is working and what is missing. Upgrade Path and Revision: draft concrete upgrades to close gaps to reach 10/10, performing surgical revisions that preserve voice and structure.

When to Use It

  • Assess artifacts like code, documentation, design, or communication to identify quality gaps across dimensions
  • During pre-release or review cycles where a 10/10 standard is required
  • Quality improvement tasks where bias must be avoided by evaluating dimensions in parallel
  • Artifacts with known failure modes that need targeted upgrades
  • Cross-functional artifacts where you want a transparent, learnable revision process

Quick Start

  1. Step 1: Collect the artifact and confirm context with the user to tailor dimensions
  2. Step 2: Propose 5-7 dimensions and score them in parallel (X/10) with notes for each
  3. Step 3: Write upgrade actions to reach 10/10 and perform a surgical revision preserving voice

Best Practices

  • Limit dimensions to 5-7 and tailor to artifact type
  • Score each dimension independently to avoid bias bleed
  • Document what is working and what is missing for each dimension
  • Preserve voice and structure during upgrades (surgical revision)
  • Show changes clearly and include a revision log to teach how to reach 10/10

Example Use Cases

  • Evaluating a code module for readability, efficiency, error handling, and maintainability
  • Auditing API documentation for clarity, completeness, and discoverability
  • Reviewing a design mockup for usability, accessibility, responsiveness, and branding
  • Assessing a research report for methodological rigor, transparency, and reproducibility
  • Polishing a marketing landing page for clarity, tone, conversion, and accessibility

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers