mcts-simulate
Scannednpx machina-cli add skill NewJerseyStyle/plugin-mcts/mcts-simulate --openclawMCTS Simulation Phase
You are executing the SIMULATION (rollout) phase of Monte Carlo Tree Search.
LLM as Heuristic Policy
Use your knowledge to:
- Guide the rollout toward realistic outcomes
- Evaluate terminal states with meaningful scores
- Detect dead ends early to save computation
Simulation Algorithm
- Start from the expanded node
- Rollout to terminal state:
- Select actions using LLM policy (not random!)
- Simulate state transitions
- Continue until terminal or max depth
- Evaluate the outcome:
- Success: positive reward (e.g., 1.0)
- Partial success: proportional reward (e.g., 0.5)
- Failure: zero or negative reward
Using MCP Tools
Call mcts_simulate with:
node_id: The node to simulate frommax_depth: Maximum rollout depth (default: 10)evaluation_criteria: What constitutes success
The tool returns:
terminal_state: The final state reachedreward: Numerical evaluation [0, 1]rollout_path: Sequence of actions takenreasoning: Explanation of the evaluation
Simulation Strategy
For the current context: $ARGUMENTS
Rollout Policy
Instead of random rollout, use informed policy:
- At each step, consider 2-3 likely actions
- Choose based on domain knowledge
- Prefer actions that lead to decisive outcomes
Evaluation Criteria
For Research:
- Does the path lead to valid conclusions?
- Is evidence sufficient and reliable?
- Are there logical gaps?
For Planning:
- Does the plan achieve the goal?
- Are resources within budget?
- Are there critical risks?
For Coding:
- Does the solution work correctly?
- Is the code clean and maintainable?
- Are edge cases handled?
Reward Assignment
reward = completeness * correctness * efficiency
Where each factor is in [0, 1]:
- completeness: How much of the goal is achieved
- correctness: How valid is the solution
- efficiency: How elegant/optimal is it
Output
After simulation, report:
- Terminal state reached
- Reward value with breakdown
- Key insights from the rollout
- Any observations to record
Proceed to BACKPROPAGATION with the reward.
Source
git clone https://github.com/NewJerseyStyle/plugin-mcts/blob/main/skills/mcts-simulate/SKILL.mdView on GitHub Overview
This skill executes the Simulation (rollout) phase of Monte Carlo Tree Search using an LLM as the heuristic policy. It guides rollouts, evaluates terminal states with meaningful scores, and detects dead ends to save computation. Results feed back into backpropagation to improve search guidance.
How This Skill Works
Start from an expanded node and, at each step, select 2-3 likely actions using an LLM policy instead of random moves. Simulate state transitions until reaching a terminal state or hitting max depth, then compute a reward based on completeness, correctness, and efficiency. The rollout data (terminal state, reward, rollout_path, reasoning) is prepared for backpropagation.
When to Use It
- Large search spaces where random rollouts are ineffective
- Tasks requiring domain-informed rollout trajectories
- Situations needing meaningful terminal-state rewards (0-1 scale)
- Limited rollout depth where early dead-end detection saves compute
- Backpropagation that benefits from rollout path and reasoning data
Quick Start
- Step 1: Call mcts_simulate with node_id and max_depth (default 10)
- Step 2: At each step, select 2-3 likely actions using LLM policy and simulate
- Step 3: On terminal or max depth, compute reward and collect rollout_path and reasoning; prepare for backpropagation
Best Practices
- Use 2-3 likely actions per step instead of random choices
- Ground the LLM policy in domain knowledge to guide outcomes
- Record terminal_state, reward, rollout_path, and reasoning for backpropagation
- Normalize and report reward as completeness * correctness * efficiency (0-1 each)
- Review rollout insights and observations before updating the tree
Example Use Cases
- AI planning tool estimating a project plan under budget
- Code synthesis workflow evaluating correctness and maintainability
- Research assistant validating a hypothesis with evidence chain
- Strategy game AI testing sequences to reach decisive outcomes
- Robotics task planner checking feasibility under time/resource limits