What does backpropagation update?

It increments N and Q along the path from leaf to root and computes avg_reward for each updated node.

Can I limit updates to a specific path?

Yes. You can pass an explicit path so only those nodes are updated.

How is discounting applied?

If you enable a discounted update, Q is incremented by reward * (gamma ^ depth_from_leaf) for nodes deeper toward the root.

mcts-backpropagate

Scanned

npx machina-cli add skill NewJerseyStyle/plugin-mcts/mcts-backpropagate --openclaw

Files (1)

SKILL.md

2.0 KB

MCTS Backpropagation Phase

You are executing the BACKPROPAGATION phase of Monte Carlo Tree Search.

Backpropagation Algorithm

Start from the simulated node
Traverse up to root:
- For each node on the path:
  - Increment visit count: N = N + 1
  - Add reward to value: Q = Q + reward
Record the update for analysis

Using MCP Tools

Call mcts_backpropagate with:

node_id: The leaf node where simulation ended
reward: The reward from simulation
path: (optional) Explicit path to update

The tool returns:

nodes_updated: List of updated node IDs
new_statistics: Updated Q and N for each node
tree_depth: Current maximum depth

Statistics Update

For each node in the path from leaf to root:

node.N += 1
node.Q += reward
node.avg_reward = node.Q / node.N

Backpropagation Strategy

For the current context: $ARGUMENTS

Standard Update

Each node gets the same reward
Simple and effective for most problems

Discounted Update (optional)

Apply discount factor γ as you go up
Nodes closer to outcome get more credit
node.Q += reward * (γ ^ depth_from_leaf)

Observation Recording

After backpropagation:

Record any new insights as observations
Update beliefs if the result was surprising
Note if any branch is now clearly best/worst

Convergence Check

After updating, check:

Best path stability: Has the best path changed?
Value convergence: Are top nodes' values stabilizing?
Sufficient exploration: Have all branches been tried?

Output

After backpropagation, report:

Nodes updated with new statistics
Current best path and its average reward
Exploration coverage (% of nodes visited)
Whether to continue or extract solution

If continuing, return to SELECTION phase. If converged or budget exhausted, extract the solution.

Source

git clone https://github.com/NewJerseyStyle/plugin-mcts/blob/main/skills/mcts-backpropagate/SKILL.mdView on GitHub

Overview

This skill executes the BACKPROPAGATION phase of Monte Carlo Tree Search. It updates N and Q along the path from the leaf to the root using the simulation reward, optionally recording new statistics for analysis.

How This Skill Works

Starting at the simulated (leaf) node, traverse up to the root and for each node on the path: increment N by 1 and add the reward to Q. Compute avg_reward as Q divided by N. If a path is provided, limit updates to those nodes. Use mcts_backpropagate(node_id, reward, path) to get nodes_updated, new_statistics, and tree_depth; you can apply a standard update or a discounted update using γ.

When to Use It

After finishing a Monte Carlo rollout, update statistics from leaf to root
When evaluating which branch to prefer, refresh N and Q across the path
To credit nearer-to-outcome nodes more, apply a discounted update with γ
When you have an explicit path, use the optional path argument to limit updates
To inspect convergence and exploration, analyze the reported tree_depth and updated stats

Quick Start

Step 1: Run a simulation to obtain a leaf node and reward
Step 2: Call mcts_backpropagate with node_id, reward, and optional path
Step 3: Inspect nodes_updated and updated statistics to guide the next SELECTION phase

Best Practices

Always start from the leaf and walk up to the root
Keep both N and Q consistent across updates to avoid bias
Compute avg_reward = Q / N after each increment
Prefer standard updates for most problems; reserve discounted updates for nuanced credit
Log observations and updated statistics for later analysis

Example Use Cases

MCTS in a Go engine: backpropagating rollout rewards to refine move scores
MCTS-based planning for robotics where path statistics guide decisions
Chess engine exploration where root evaluation is derived from leaf simulations
Game AI tuning where backpropagation reveals stable best paths
Reinforcement-learning-inspired MCTS variants that blend exploration and exploitation

Frequently Asked Questions

Add this skill to your agents