What triggers loop-execution-evaluator?

Triggered by 'evaluate execution', 'review implementation', 'check build', '/phase-review', and always runs after loop-executor.

Can multiple evaluators run for a single track?

Yes; multi-type tracks may dispatch to several evaluators. All must pass for the track to succeed.

What checks are always performed?

Structural checks such as plan.md updates, scope alignment, no skipped tasks, and build success, with business-doc sync considerations when applicable.

loop-execution-evaluator

Scanned

npx machina-cli add skill Ibrahim-3d/conductor-orchestrator-superpowers/loop-execution-evaluator --openclaw

Files (1)

SKILL.md

6.8 KB

Loop Execution Evaluator — Step 4: Dispatcher

This agent does NOT evaluate directly. It determines the track type and dispatches the correct specialized evaluator.

Why Specialized Evaluators?

Different track types need fundamentally different checks:

A UI track needs design system adherence, visual consistency, responsive checks
A feature track needs build integrity, type safety, code patterns
An integration track needs API contracts, auth flows, error recovery
A business logic track needs product rules, edge cases, state transitions

A generic checklist misses critical issues specific to each type.

Dispatch Logic

Read the track's metadata.json and spec.md to determine the track type, then dispatch:

Track Type	Keywords in spec/metadata	Evaluator
UI / Design	"screen", "component", "design system", "layout", "visual", "UI shell"	`eval-ui-ux`
Feature / Code	"implement", "feature", "refactor", "infrastructure", "hook", "store"	`eval-code-quality`
Integration	"Supabase", "Stripe", "Gemini", "API", "auth", "database", "webhook"	`eval-integration`
Business Logic	"generation", "lock", "dependency", "pricing", "tier", "pipeline", "download"	`eval-business-logic`

Multi-Type Tracks

Some tracks need multiple evaluators. For example:

A generator logic track → eval-business-logic + eval-code-quality
An auth/DB integration track → eval-integration + eval-code-quality
A UI shell track → eval-ui-ux only

When multiple evaluators apply, run them all. The track passes only if ALL evaluators pass.

Dispatch Workflow

1. Read track metadata.json + spec.md
2. Determine track type(s)
3. Dispatch evaluator(s):
   → eval-ui-ux         (if UI track)
   → eval-code-quality   (if code/feature track)
   → eval-integration    (if integration track)
   → eval-business-logic (if logic track)
4. Collect results from all dispatched evaluators
5. Aggregate into final verdict

Structural Checks (Always Run)

Regardless of track type, always verify these baseline checks:

Check	Method
plan.md updated	All completed tasks marked `[x]` with commit SHA and summary
Scope alignment	No unplanned work added without documentation
No skipped tasks	All `[ ]` tasks either completed or documented as intentionally deferred
Build passes	`npm run build` exits 0
Business docs in sync	If track made pricing/model/business decisions, verify docs are flagged for Step 5.5 sync

Business Doc Sync Check

If the track made any business-impacting changes, verify:

The executor's summary includes Business Doc Sync Required: Yes
Affected documents are listed
This flags the Conductor to run Step 5.5 (Business Doc Sync) before marking complete

What counts as business-impacting:

Pricing tier, price point, or feature list changes
AI model, SDK, or cost structure changes
New package or product tier additions
Asset pipeline changes (add/remove/modify assets)
Persona, GTM, or revenue assumption changes

See .claude/skills/business-docs-sync/SKILL.md for the full registry.

Aggregated Verdict

## Execution Evaluation Report

**Track**: [track-id]
**Evaluator**: loop-execution-evaluator (dispatcher)
**Date**: [YYYY-MM-DD]

### Evaluators Dispatched
| Evaluator | Reason | Verdict |
|-----------|--------|---------|
| eval-ui-ux | Track builds P0 screens | PASS ✅ / FAIL ❌ |
| eval-code-quality | Track implements features | PASS ✅ / FAIL ❌ |

### Structural Checks
- plan.md updated: YES / NO
- Scope alignment: YES / NO
- Build passes: YES / NO
- Business doc sync needed: YES / NO (if YES, list affected docs)

### Final Verdict: PASS ✅ / FAIL ❌
All evaluators must PASS for the track to pass.

[If FAIL, aggregate all fix actions from all evaluators]

Metadata Checkpoint Updates

The execution evaluator MUST update the track's metadata.json at key points:

On Start

{
  "loop_state": {
    "current_step": "EVALUATE_EXECUTION",
    "step_status": "IN_PROGRESS",
    "step_started_at": "[ISO timestamp]",
    "checkpoints": {
      "EVALUATE_EXECUTION": {
        "status": "IN_PROGRESS",
        "started_at": "[ISO timestamp]",
        "agent": "loop-execution-evaluator"
      }
    }
  }
}

On PASS

{
  "loop_state": {
    "current_step": "BUSINESS_SYNC",
    "step_status": "NOT_STARTED",
    "checkpoints": {
      "EVALUATE_EXECUTION": {
        "status": "PASSED",
        "completed_at": "[ISO timestamp]",
        "verdict": "PASS",
        "evaluators_run": [
          { "evaluator": "eval-code-quality", "verdict": "PASS", "issues": [] },
          { "evaluator": "eval-business-logic", "verdict": "PASS", "issues": [] }
        ],
        "business_sync_required": true
      },
      "BUSINESS_SYNC": {
        "status": "NOT_STARTED",
        "required": true
      }
    }
  }
}

On FAIL

{
  "loop_state": {
    "current_step": "FIX",
    "step_status": "NOT_STARTED",
    "checkpoints": {
      "EVALUATE_EXECUTION": {
        "status": "FAILED",
        "completed_at": "[ISO timestamp]",
        "verdict": "FAIL",
        "evaluators_run": [
          { "evaluator": "eval-code-quality", "verdict": "PASS", "issues": [] },
          { "evaluator": "eval-business-logic", "verdict": "FAIL", "issues": ["Business rule violation found"] }
        ],
        "failure_items": [
          "Fix business rule enforcement in resolver",
          "Add test coverage for edge case"
        ]
      },
      "FIX": {
        "status": "NOT_STARTED",
        "cycle": 1
      }
    }
  }
}

Update Protocol

Read current metadata.json
Update loop_state.checkpoints.EVALUATE_EXECUTION with results
If PASS + business sync needed: Set current_step to BUSINESS_SYNC
If PASS + no sync needed: Set current_step to COMPLETE
If FAIL: Set current_step to FIX, increment fix_cycle_count in loop_state
Write back to metadata.json

Handoff

ALL PASS + No Business Doc Sync → Conductor marks track complete (Step 5)
ALL PASS + Business Doc Sync Needed → Conductor runs Step 5.5 (Business Doc Sync) before marking complete
ANY FAIL → Conductor dispatches loop-fixer with combined fix list

Source

git clone https://github.com/Ibrahim-3d/conductor-orchestrator-superpowers/blob/master/skills/loop-execution-evaluator/SKILL.md

View on GitHub

Overview

Loop Execution Evaluator acts as the dispatcher for evaluation steps. It reads the track metadata and spec to determine the track type and invokes the appropriate specialized evaluator (UI/UX, code-quality, integration, or business logic). It does not run a generic checklist, ensuring issues are evaluated in the right domain.

How This Skill Works

It reads metadata.json and spec.md to identify track type(s), maps to the corresponding evaluators, runs all applicable evaluators (if multi-type), and aggregates their verdicts into a final result.

When to Use It

When a track is ready for evaluation after loop-executor
During a /phase-review or build-check to route checks by track type
When metadata/spec indicate a UI, integration, feature/infra, or business-logic track
When a multi-type track requires multiple evaluators
When you need a consolidated verdict from all applicable evaluators

Quick Start

Step 1: Read track metadata.json and spec.md to identify track type(s).
Step 2: Map track type(s) to evaluators (UI/UX, code-quality, integration, business-logic).
Step 3: Dispatch all applicable evaluators and aggregate their verdicts into the final report.

Best Practices

Read metadata.json and spec.md to identify track type(s)
Dispatch every applicable evaluator; don’t skip multi-type checks
Run and then aggregate verdicts; require all to pass for multi-type tracks
Maintain structural baseline checks regardless of track type
Ensure traceability by recording track type, evaluators dispatched, and final verdict

Example Use Cases

UI shell track triggers eval-ui-ux only
Generator logic track triggers eval-business-logic + eval-code-quality
Auth/DB integration track triggers eval-integration + eval-code-quality
Feature/infrastructure change triggers eval-code-quality
Phase-review scenario triggers dispatch after loop-executor during /phase-review

Frequently Asked Questions

Add this skill to your agents