What triggers this skill?

Triggered by 'evaluate plan', 'review plan', or 'check plan before executing' commands.

A PASS/FAIL verdict with detailed evaluation findings and governance actions when needed.

loop-plan-evaluator

Scanned

npx machina-cli add skill Ibrahim-3d/conductor-orchestrator-superpowers/loop-plan-evaluator --openclaw

Files (1)

SKILL.md

10.3 KB

Loop Plan Evaluator Agent — Step 2: EVALUATE PLAN

Pre-execution quality gate. Verifies the plan is correct and scoped before any implementation begins. This prevents the exact problem that caused the PLAN-005 design system rebuild — an agent executing work that was already done.

For major tracks (architecture, features with 5+ tasks, integrations, infrastructure), this step also invokes the Board of Directors for multi-perspective expert review.

Inputs Required

Track's plan.md — the plan to evaluate (including DAG)
Track's spec.md — requirements to check against
conductor/tracks.md — completed tracks (overlap check)
Track's metadata.json — track type and priority
Codebase state — what files/components already exist

Evaluation Passes

Pass 1: Scope Alignment

Check every task against spec.md:

For Each Task	Check
Is it in spec?	Task must trace to a specific spec requirement
Is it needed?	Would removing this task leave a spec requirement unmet?
Is it scoped?	Does the task do only what spec asks, not more?

Output:

### Scope Alignment: PASS ✅ / FAIL ❌
- Tasks in spec: [X]/[Y]
- Tasks NOT in spec (scope creep): [list]
- Spec requirements NOT covered: [list]

Pass 2: Overlap Detection

Cross-reference with tracks.md and the codebase:

Check	Method
Track overlap	Compare plan tasks against completed track deliverables
File overlap	Check if planned files already exist in codebase
Component overlap	Check if planned components already exist

Output:

### Overlap Detection: PASS ✅ / FAIL ❌
- Overlapping tasks: [list with which track already did them]
- Files that already exist: [list]
- Recommendation: [SKIP/MODIFY/PROCEED for each overlap]

Pass 3: Dependency Check

Verify task ordering and prerequisites:

Check	Question
Track deps	Are prerequisite tracks marked complete in `tracks.md`?
Task ordering	Do later tasks depend on earlier tasks being done first?
External deps	Are required packages/APIs available?

Output:

### Dependencies: PASS ✅ / FAIL ❌
- Missing track dependencies: [list]
- Misordered tasks: [list]
- Missing external dependencies: [list]

Pass 4: Task Quality

Evaluate each task for clarity and completeness:

Check	Criteria
Specific	Action is clear (not vague like "set up infrastructure")
Acceptance criteria	Can you objectively verify completion?
File targets	Expected file paths are listed?
Session-sized	Can be completed in one sitting?

Output:

### Task Quality: PASS ✅ / FAIL ❌
- Vague tasks: [list with suggestions to clarify]
- Missing acceptance criteria: [list]
- Oversized tasks (should split): [list]

Pass 5: DAG Validation

Verify the dependency graph is valid for parallel execution:

Check	Method
DAG exists	Plan contains `dag:` block with nodes and parallel_groups
No cycles	Topological sort succeeds (no circular dependencies)
Valid refs	All `depends_on` references point to existing task IDs
File conflicts	Parallel groups with shared files have coordination strategy
Levels correct	Tasks in same parallel_group are at same topological level

Cycle Detection Algorithm:

def detect_cycles(dag):
    """Returns True if cycle exists, False otherwise."""
    visited = set()
    rec_stack = set()

    def dfs(node_id):
        visited.add(node_id)
        rec_stack.add(node_id)

        node = next((n for n in dag['nodes'] if n['id'] == node_id), None)
        for dep in node.get('depends_on', []):
            if dep not in visited:
                if dfs(dep):
                    return True
            elif dep in rec_stack:
                return True  # Cycle detected

        rec_stack.remove(node_id)
        return False

    for node in dag['nodes']:
        if node['id'] not in visited:
            if dfs(node['id']):
                return True
    return False

Output:

### DAG Validation: PASS ✅ / FAIL ❌
- DAG present: yes/no
- Nodes: [count]
- Parallel groups: [count]
- Cycle detected: yes/no (list cycle path if yes)
- Invalid references: [list of broken depends_on]
- Conflict issues: [list parallel groups with unhandled file conflicts]

Pass 6: Board of Directors Review (Major Tracks Only)

For major tracks, invoke the Board of Directors for expert deliberation:

When to invoke Board:

Track type is architecture, integration, or infrastructure
Track has 5+ tasks
Track touches security (auth, payments, data protection)
Track is high priority (P0)
Plan version > 1 (previously failed evaluation)

Board Invocation:

// If track qualifies for board review
if (isMajorTrack(metadata)) {
  // Initialize board session via message bus
  const boardResult = await invokeBoardMeeting(
    proposal: plan.md content,
    context: { spec, metadata, dag }
  );

  // Store board session in metadata
  metadata.loop_state.board_sessions.push({
    session_id: boardResult.session_id,
    checkpoint: "EVALUATE_PLAN",
    verdict: boardResult.verdict,
    vote_summary: boardResult.votes,
    conditions: boardResult.conditions,
    timestamp: new Date().toISOString()
  });

  // Board verdict affects overall evaluation
  if (boardResult.verdict === "REJECTED") {
    return FAIL with board conditions;
  }
}

Output:

### Board Review: PASS ✅ / FAIL ❌ / SKIPPED ⏭️
- Board invoked: yes/no (reason if no)
- Directors voted: [CA, CPO, CSO, COO, CXO]
- Verdict: APPROVED / APPROVED_WITH_REVIEW / REJECTED
- Vote breakdown: [X] APPROVE / [Y] REJECT
- Conditions from board:
  1. [Condition 1] (from [Director])
  2. [Condition 2] (from [Director])

Verdict

## Plan Evaluation Report

**Track**: [track-id]
**Evaluator**: loop-plan-evaluator
**Date**: [YYYY-MM-DD]
**Execution Mode**: SEQUENTIAL | PARALLEL

### Results
| Pass | Status |
|------|--------|
| Scope Alignment | PASS ✅ / FAIL ❌ |
| Overlap Detection | PASS ✅ / FAIL ❌ |
| Dependencies | PASS ✅ / FAIL ❌ |
| Task Quality | PASS ✅ / FAIL ❌ |
| DAG Validation | PASS ✅ / FAIL ❌ |
| Board Review | PASS ✅ / FAIL ❌ / SKIPPED ⏭️ |

### Parallel Execution Summary
- **Total Tasks**: [count]
- **Parallel Groups**: [count]
- **Max Concurrency**: [max workers in a parallel group]
- **Conflict-Free Groups**: [count]
- **Coordinated Groups**: [count with shared resources]

### Board Decision (if applicable)
- **Verdict**: [APPROVED / APPROVED_WITH_REVIEW / REJECTED]
- **Vote**: [X APPROVE / Y REJECT]
- **Conditions**: [count] conditions attached
- **Session ID**: [board-{timestamp}]

### Verdict: PASS ✅ → Proceed to Parallel Execution
### Verdict: FAIL ❌ → Return to Planner with fixes:
1. [Fix 1]
2. [Fix 2]

### Board Conditions (carry forward):
1. [Condition from board that must be verified in EVALUATE_EXECUTION]

Metadata Checkpoint Updates

The plan evaluator MUST update the track's metadata.json at key points:

On Start

{
  "loop_state": {
    "current_step": "EVALUATE_PLAN",
    "step_status": "IN_PROGRESS",
    "step_started_at": "[ISO timestamp]",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "IN_PROGRESS",
        "started_at": "[ISO timestamp]",
        "agent": "loop-plan-evaluator"
      }
    }
  }
}

On PASS

{
  "loop_state": {
    "current_step": "PARALLEL_EXECUTE",
    "step_status": "NOT_STARTED",
    "execution_mode": "PARALLEL",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "PASSED",
        "completed_at": "[ISO timestamp]",
        "verdict": "PASS",
        "checks": {
          "scope_alignment": true,
          "overlap_detection": true,
          "dependencies": true,
          "task_quality": true,
          "dag_validation": true,
          "board_review": true
        },
        "cto_review": {
          "status": "PASSED",
          "reviewed_at": "[timestamp if run]"
        },
        "dag_summary": {
          "total_tasks": 8,
          "parallel_groups": 3,
          "max_concurrency": 4,
          "conflict_free_groups": 2,
          "coordinated_groups": 1
        }
      },
      "PARALLEL_EXECUTE": {
        "status": "NOT_STARTED"
      }
    },
    "board_sessions": [
      {
        "session_id": "board-20260201-123456",
        "checkpoint": "EVALUATE_PLAN",
        "verdict": "APPROVED",
        "vote_summary": {
          "CA": "APPROVE",
          "CPO": "APPROVE",
          "CSO": "APPROVE",
          "COO": "APPROVE",
          "CXO": "APPROVE"
        },
        "conditions": [
          "Add caching layer (CA)",
          "Security audit before launch (CSO)"
        ],
        "timestamp": "[ISO timestamp]"
      }
    ]
  }
}

On FAIL

{
  "loop_state": {
    "current_step": "PLAN",
    "step_status": "NOT_STARTED",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "FAILED",
        "completed_at": "[ISO timestamp]",
        "verdict": "FAIL",
        "checks": {
          "scope_alignment": true,
          "overlap_detection": false,
          "dependencies": true,
          "task_quality": false
        },
        "failure_reasons": [
          "Overlap with existing track: component already built",
          "Task 3 is too vague"
        ]
      },
      "PLAN": {
        "status": "NOT_STARTED",
        "plan_version": 2
      }
    }
  }
}

Update Protocol

Read current metadata.json
Update loop_state.checkpoints.EVALUATE_PLAN with verdict and checks
If PASS: Advance current_step to EXECUTE
If FAIL: Reset current_step to PLAN, increment plan_version
Write back to metadata.json

Handoff

PASS → Conductor dispatches loop-executor (Step 3)
FAIL → Conductor dispatches loop-planner to revise plan, then re-evaluates

Source

git clone https://github.com/Ibrahim-3d/conductor-orchestrator-superpowers/blob/master/skills/loop-plan-evaluator/SKILL.md

View on GitHub

Overview

The loop-plan-evaluator validates a plan before any code is written. It ensures scope alignment, checks for overlap with completed work, verifies DAG validity and dependencies, and assesses task clarity. For major tracks, it invokes the Board of Directors for governance and outputs a PASS/FAIL verdict.

How This Skill Works

It ingests plan.md, spec.md, conductor/tracks.md, metadata.json, and the current codebase state, then runs five evaluation passes: Scope Alignment, Overlap Detection, Dependency Check, Task Quality, and DAG Validation. For major tracks it may trigger governance and then emits a PASS/FAIL verdict with detailed findings.

When to Use It

When you want to validate a new plan before writing code
Before starting a multi-track feature or integration
To detect overlap with previously completed tracks
To ensure correct task ordering and dependencies
Before escalating to governance for major tracks

Quick Start

Step 1: Collect plan.md, spec.md, tracks, and current codebase state
Step 2: Run the evaluator to perform Pass 1 through Pass 5 and await PASS/FAIL
Step 3: Review findings and proceed to loop-executor if PASS, or iterate if FAIL

Best Practices

Trace every task back to a specific spec requirement
Cross-check planned work against completed tracks to avoid duplication
Verify DAG structure and that depends_on references exist
Make task definitions specific with clear acceptance criteria
Engage the Board of Directors for major tracks when flagged

Example Use Cases

Evaluating a 6-task feature plan for a new API integration and receiving a PASS with no overlap and valid DAG
Reviewing a bugfix plan to ensure no duplicate work from the last release and that dependencies are satisfied
Assessing a parallelizable deployment plan for infrastructure changes with no cycle in the DAG
Validating a cross-team feature rollout before coding and triggering governance for a 5+ task track
Escalating a plan to the Board of Directors after a plan shows scope creep and missing prerequisites

Frequently Asked Questions

Add this skill to your agents

loop-plan-evaluator

Loop Plan Evaluator Agent — Step 2: EVALUATE PLAN

Inputs Required

Evaluation Passes

Pass 1: Scope Alignment

Pass 2: Overlap Detection

Pass 3: Dependency Check

Pass 4: Task Quality

Pass 5: DAG Validation

Pass 6: Board of Directors Review (Major Tracks Only)

Verdict

Metadata Checkpoint Updates

On Start

On PASS

On FAIL

Update Protocol

Handoff

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What triggers this skill?

What inputs does it require?

What does it output?