Get the FREE Ultimate OpenClaw Setup Guide →

skill-quality-auditor

Scanned
npx machina-cli add skill pantheon-org/tekhne/skill-quality-auditor --openclaw
Files (1)
SKILL.md
5.3 KB

Skill Quality Auditor

Navigation hub for evaluating, maintaining, and improving skill quality with 9-dimension framework scoring.

Quick Start

# Evaluate single skill
sh skills/agentic-harness/skill-quality-auditor/scripts/evaluate.sh <skill-name> --json

# Batch audit multiple skills
sh skills/agentic-harness/skill-quality-auditor/scripts/batch-audit.sh <skill1> <skill2> [skill3...]

# Audit all skills
sh skills/agentic-harness/skill-quality-auditor/scripts/audit-skills.sh

# Emergency triage (PR context)
sh skills/agentic-harness/skill-quality-auditor/scripts/audit-skills.sh --pr-changes-only

Results stored in .context/audits/<skill-name>/latest/.

When to Use

  • Running periodic quality audits with 9-dimension framework scoring
  • Evaluating specific skills before merge using deterministic criteria
  • Validating runtime effectiveness via tessl eval scenarios (D9)
  • Creating remediation plans with measurable success criteria
  • Detecting duplication (>20% similarity threshold) and planning aggregations
  • Enforcing artifact conventions across skill collections
  • Implementing CI quality gates with score thresholds (see Score Thresholds)

Workflow

  1. Inventory target skills with proper directory scanning
  2. Evaluate using 9-dimension framework (Knowledge Delta + Eval Validation priority)
  3. Validate artifacts, consistency, and eval coverage using deterministic script checks
  4. Generate reports with JSON output and baseline comparison data
  5. Plan remediation with measurable success criteria (including eval creation)
  6. Re-evaluate and track score deltas with audit trails

Mindset

  • Treat scores as directional signals, not absolute truth.
  • Prioritize deterministic, reproducible checks over manual review.
  • Apply strict rules where safety/consistency matters; stay flexible elsewhere.
  • Use threshold-based evaluation rather than relative comparisons.

Anti-Patterns (Summary)

  • NEVER skip baseline comparison in recurring audits
  • NEVER ignore Knowledge Delta scoring below 15/20
  • NEVER apply subjective scoring without deterministic checks
  • NEVER use harness-specific paths in skill content
  • NEVER mention specific agent names in skill instructions
  • NEVER create kitchen-sink skills that cover multiple unrelated tasks
  • NEVER bypass skill-quality-auditor in favor of tessl review alone

See Detailed Anti-Patterns for complete WHY/BAD/GOOD failure mode documentation.

Examples

./scripts/evaluate.sh infrastructure/terraform-generator --json --store
# Output: {"grade":"B+","total":122,"dimensions":{...}}

./scripts/batch-audit.sh infrastructure/terraform-generator ci-cd/github-actions-generator
# Audits multiple skills, stores results in .context/audits/

./scripts/evaluate.sh documentation/markdown-authoring --json --store
# Score: 98/140 (C+) -> review remediation-plan.md -> fix -> re-audit -> 128/140 (A)

See Audit Workflow Examples for detailed input/output pairs, baseline comparison, remediation workflow, and CI quality gate configuration.

Self-Audit

./scripts/evaluate.sh agentic-harness/skill-quality-auditor --json
# Expected: grade "A" or higher, total >= 126/140
# Grade thresholds: A >= 126, B+ >= 119, B >= 112, C+ < 112

Reference Map

Critical: Quality Thresholds | Anti-Patterns

Framework: Dimensions | Scoring Rubric | Quality Standards | Pattern Recognition

Operations: Remediation Planning | Duplication Detection | Scripts Workflow | Tessl Compliance

Related Skills: creating-eval-scenarios (D9 eval generation)

Source

git clone https://github.com/pantheon-org/tekhne/blob/main/skills/agentic-harness/skill-quality-auditor/SKILL.mdView on GitHub

Overview

Skill Quality Auditor evaluates agent skill collections using a nine-dimension framework (Knowledge Delta, Mindset, Anti-Patterns, Specification Compliance, Progressive Disclosure, Freedom Calibration, Pattern Recognition, Practical Usability, Eval Validation). It detects duplicates, generates remediation plans with size-based tasks, enforces CI quality gates, and tracks score trends to ensure tessl registry compliance.

How This Skill Works

The tool inventories target skills, evaluates them across nine dimensions, and runs deterministic checks on artifacts. It produces JSON reports, performs baseline comparisons, and generates remediation plans with measurable criteria; it then re-evaluates to track score deltas and audit trails.

When to Use It

  • Evaluating skill quality and publication readiness
  • Auditing SKILL.md files for consistency
  • Detecting duplicate skills and planning aggregations
  • Enforcing artifact conventions and CI quality gates
  • Creating quality dashboards with tessl compliance checks

Quick Start

  1. Step 1: Run evaluate.sh for a skill and get JSON output
  2. Step 2: Run batch-audit.sh for multiple skills
  3. Step 3: Use audit-skills.sh to scan all skills and review results

Best Practices

  • Inventory skills with proper directory scanning
  • Evaluate using the nine-dimension framework with Knowledge Delta priority
  • Apply deterministic, script-based checks for artifacts
  • Plan remediation with measurable success criteria (including eval creation)
  • Track score deltas with audit trails and baseline comparisons

Example Use Cases

  • ./scripts/evaluate.sh infrastructure/terraform-generator --json --store
  • ./scripts/batch-audit.sh infrastructure/terraform-generator ci-cd/github-actions-generator
  • Audits multiple skills, stores results in .context/audits/
  • Audits stored under .context/audits/<skill-name>/latest/
  • Audit all skills with audit-skills.sh

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers