mcts-backpropagate
Scannednpx machina-cli add skill NewJerseyStyle/plugin-mcts/mcts-backpropagate --openclawMCTS Backpropagation Phase
You are executing the BACKPROPAGATION phase of Monte Carlo Tree Search.
Backpropagation Algorithm
- Start from the simulated node
- Traverse up to root:
- For each node on the path:
- Increment visit count: N = N + 1
- Add reward to value: Q = Q + reward
- For each node on the path:
- Record the update for analysis
Using MCP Tools
Call mcts_backpropagate with:
node_id: The leaf node where simulation endedreward: The reward from simulationpath: (optional) Explicit path to update
The tool returns:
nodes_updated: List of updated node IDsnew_statistics: Updated Q and N for each nodetree_depth: Current maximum depth
Statistics Update
For each node in the path from leaf to root:
node.N += 1
node.Q += reward
node.avg_reward = node.Q / node.N
Backpropagation Strategy
For the current context: $ARGUMENTS
Standard Update
- Each node gets the same reward
- Simple and effective for most problems
Discounted Update (optional)
- Apply discount factor γ as you go up
- Nodes closer to outcome get more credit
node.Q += reward * (γ ^ depth_from_leaf)
Observation Recording
After backpropagation:
- Record any new insights as observations
- Update beliefs if the result was surprising
- Note if any branch is now clearly best/worst
Convergence Check
After updating, check:
- Best path stability: Has the best path changed?
- Value convergence: Are top nodes' values stabilizing?
- Sufficient exploration: Have all branches been tried?
Output
After backpropagation, report:
- Nodes updated with new statistics
- Current best path and its average reward
- Exploration coverage (% of nodes visited)
- Whether to continue or extract solution
If continuing, return to SELECTION phase. If converged or budget exhausted, extract the solution.
Source
git clone https://github.com/NewJerseyStyle/plugin-mcts/blob/main/skills/mcts-backpropagate/SKILL.mdView on GitHub Overview
This skill executes the BACKPROPAGATION phase of Monte Carlo Tree Search. It updates N and Q along the path from the leaf to the root using the simulation reward, optionally recording new statistics for analysis.
How This Skill Works
Starting at the simulated (leaf) node, traverse up to the root and for each node on the path: increment N by 1 and add the reward to Q. Compute avg_reward as Q divided by N. If a path is provided, limit updates to those nodes. Use mcts_backpropagate(node_id, reward, path) to get nodes_updated, new_statistics, and tree_depth; you can apply a standard update or a discounted update using γ.
When to Use It
- After finishing a Monte Carlo rollout, update statistics from leaf to root
- When evaluating which branch to prefer, refresh N and Q across the path
- To credit nearer-to-outcome nodes more, apply a discounted update with γ
- When you have an explicit path, use the optional path argument to limit updates
- To inspect convergence and exploration, analyze the reported tree_depth and updated stats
Quick Start
- Step 1: Run a simulation to obtain a leaf node and reward
- Step 2: Call mcts_backpropagate with node_id, reward, and optional path
- Step 3: Inspect nodes_updated and updated statistics to guide the next SELECTION phase
Best Practices
- Always start from the leaf and walk up to the root
- Keep both N and Q consistent across updates to avoid bias
- Compute avg_reward = Q / N after each increment
- Prefer standard updates for most problems; reserve discounted updates for nuanced credit
- Log observations and updated statistics for later analysis
Example Use Cases
- MCTS in a Go engine: backpropagating rollout rewards to refine move scores
- MCTS-based planning for robotics where path statistics guide decisions
- Chess engine exploration where root evaluation is derived from leaf simulations
- Game AI tuning where backpropagation reveals stable best paths
- Reinforcement-learning-inspired MCTS variants that blend exploration and exploitation