Get the FREE Ultimate OpenClaw Setup Guide →

mcts-simulate

Scanned
npx machina-cli add skill NewJerseyStyle/plugin-mcts/mcts-simulate --openclaw
Files (1)
SKILL.md
2.3 KB

MCTS Simulation Phase

You are executing the SIMULATION (rollout) phase of Monte Carlo Tree Search.

LLM as Heuristic Policy

Use your knowledge to:

  1. Guide the rollout toward realistic outcomes
  2. Evaluate terminal states with meaningful scores
  3. Detect dead ends early to save computation

Simulation Algorithm

  1. Start from the expanded node
  2. Rollout to terminal state:
    • Select actions using LLM policy (not random!)
    • Simulate state transitions
    • Continue until terminal or max depth
  3. Evaluate the outcome:
    • Success: positive reward (e.g., 1.0)
    • Partial success: proportional reward (e.g., 0.5)
    • Failure: zero or negative reward

Using MCP Tools

Call mcts_simulate with:

  • node_id: The node to simulate from
  • max_depth: Maximum rollout depth (default: 10)
  • evaluation_criteria: What constitutes success

The tool returns:

  • terminal_state: The final state reached
  • reward: Numerical evaluation [0, 1]
  • rollout_path: Sequence of actions taken
  • reasoning: Explanation of the evaluation

Simulation Strategy

For the current context: $ARGUMENTS

Rollout Policy

Instead of random rollout, use informed policy:

  1. At each step, consider 2-3 likely actions
  2. Choose based on domain knowledge
  3. Prefer actions that lead to decisive outcomes

Evaluation Criteria

For Research:

  • Does the path lead to valid conclusions?
  • Is evidence sufficient and reliable?
  • Are there logical gaps?

For Planning:

  • Does the plan achieve the goal?
  • Are resources within budget?
  • Are there critical risks?

For Coding:

  • Does the solution work correctly?
  • Is the code clean and maintainable?
  • Are edge cases handled?

Reward Assignment

reward = completeness * correctness * efficiency

Where each factor is in [0, 1]:

  • completeness: How much of the goal is achieved
  • correctness: How valid is the solution
  • efficiency: How elegant/optimal is it

Output

After simulation, report:

  1. Terminal state reached
  2. Reward value with breakdown
  3. Key insights from the rollout
  4. Any observations to record

Proceed to BACKPROPAGATION with the reward.

Source

git clone https://github.com/NewJerseyStyle/plugin-mcts/blob/main/skills/mcts-simulate/SKILL.mdView on GitHub

Overview

This skill executes the Simulation (rollout) phase of Monte Carlo Tree Search using an LLM as the heuristic policy. It guides rollouts, evaluates terminal states with meaningful scores, and detects dead ends to save computation. Results feed back into backpropagation to improve search guidance.

How This Skill Works

Start from an expanded node and, at each step, select 2-3 likely actions using an LLM policy instead of random moves. Simulate state transitions until reaching a terminal state or hitting max depth, then compute a reward based on completeness, correctness, and efficiency. The rollout data (terminal state, reward, rollout_path, reasoning) is prepared for backpropagation.

When to Use It

  • Large search spaces where random rollouts are ineffective
  • Tasks requiring domain-informed rollout trajectories
  • Situations needing meaningful terminal-state rewards (0-1 scale)
  • Limited rollout depth where early dead-end detection saves compute
  • Backpropagation that benefits from rollout path and reasoning data

Quick Start

  1. Step 1: Call mcts_simulate with node_id and max_depth (default 10)
  2. Step 2: At each step, select 2-3 likely actions using LLM policy and simulate
  3. Step 3: On terminal or max depth, compute reward and collect rollout_path and reasoning; prepare for backpropagation

Best Practices

  • Use 2-3 likely actions per step instead of random choices
  • Ground the LLM policy in domain knowledge to guide outcomes
  • Record terminal_state, reward, rollout_path, and reasoning for backpropagation
  • Normalize and report reward as completeness * correctness * efficiency (0-1 each)
  • Review rollout insights and observations before updating the tree

Example Use Cases

  • AI planning tool estimating a project plan under budget
  • Code synthesis workflow evaluating correctness and maintainability
  • Research assistant validating a hypothesis with evidence chain
  • Strategy game AI testing sequences to reach decisive outcomes
  • Robotics task planner checking feasibility under time/resource limits

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers