What is the UCB1 formula used by mcts-select?

UCB = Q/N + c * sqrt(ln(parent_N) / N), where Q is total reward at the node, N is visits, and c is the exploration constant (typically ~1.414).

What happens if there are unexplored nodes (N=0)?

Unexplored nodes get priority during selection, ensuring the algorithm expands new areas before over-exploiting known ones.

What outputs does mcts_select return after selection?

It returns selected_node_id, path from root to the selected node, node_state, is_terminal flag, and ucb_scores for the path, plus an indication of whether expansion is needed.

mcts-select

Scanned

npx machina-cli add skill NewJerseyStyle/plugin-mcts/mcts-select --openclaw

Files (1)

SKILL.md

1.7 KB

MCTS Selection Phase

You are executing the SELECTION phase of Monte Carlo Tree Search.

UCB1 Formula

For each node, calculate:

UCB = Q/N + c * sqrt(ln(parent_N) / N)

Where:

Q: Total value/reward accumulated at this node
N: Number of visits to this node
parent_N: Number of visits to parent node
c: Exploration constant (typically sqrt(2) ≈ 1.414)

Selection Algorithm

Start at root node
While current node is fully expanded and not terminal:
- Calculate UCB for all children
- Select child with highest UCB value
- Move to selected child
Return the selected leaf node

Using MCP Tools

Call mcts_select with optional parameters:

exploration_constant: Value for c (default: 1.414)
tree_id: If managing multiple trees

The tool returns:

selected_node_id: The ID of the selected node
path: The path from root to selected node
node_state: The state at the selected node
is_terminal: Whether this is a terminal state
ucb_scores: UCB scores for nodes along the path

Selection Strategy

For the current problem context: $ARGUMENTS

Check if any nodes are unexplored (N=0) - these get priority
Among explored nodes, balance:
- Exploitation: Nodes with high average reward (Q/N)
- Exploration: Nodes visited less frequently
Consider domain-specific heuristics from observations

Output

After selection, report:

Selected node ID and state
Path taken from root
UCB reasoning for the selection
Whether expansion is needed (if node has unexplored children)

Proceed to EXPANSION phase with the selected node.

Source

git clone https://github.com/NewJerseyStyle/plugin-mcts/blob/main/skills/mcts-select/SKILL.mdView on GitHub

Overview

Implements the Selection phase of Monte Carlo Tree Search using the UCB1 score to traverse from the root to a promising leaf. It prioritizes unexplored nodes (N=0), balances exploitation and exploration, and reports the path, node state, and UCB scores for debugging. The process prepares the next phase (Expansion) by returning the chosen leaf and its context.

How This Skill Works

Start at the root and repeatedly evaluate the UCB1 score for each child using Q/N + c * sqrt(ln(parent_N)/N). Move to the child with the highest score while the current node is fully expanded and non-terminal. If any node has N=0, those are prioritized. Return the selected leaf along with the path, node state, terminal status, and the UCB scores used for the decision.

When to Use It

During a single MCTS iteration to identify the leaf node for expansion
When balancing high-reward nodes against rarely visited ones
When you want to inspect or debug the path and UCB reasoning with path scores
When using multi-tree setups via tree_id to manage several MCTS trees in parallel
When a domain heuristic should influence the selection amidst UCB-based decisions

Quick Start

Step 1: Call mcts_select with the current root and optional parameters (exploration_constant, tree_id).
Step 2: While the current node is fully expanded and non-terminal, compute UCB for all children and pick the highest; move to that child.
Step 3: Return and inspect selected_node_id, path, node_state, is_terminal, and ucb_scores to decide on Expansion or rollout.

Best Practices

Prioritize unexplored nodes (N=0) before expanding already-visited children
Use the standard UCB1 formula: UCB = Q/N + c * sqrt(ln(parent_N) / N)
Set an appropriate exploration constant c (default ~1.414) and keep it consistent
Record and review the path and ucb_scores to diagnose poor expansions
Proceed to Expansion only after a leaf is selected or a stopping condition is met

Example Use Cases

AI game agent selecting the next move in a board game by traversing from the root to a leaf using UCB1
Robotic path planning where the agent chooses the next waypoint based on visitation and reward
Resource allocation in a strategy game where balance between known good moves and exploration is needed
Puzzle-solving or planning tasks where the search tree is large and parallel simulations help
Multi-tree MCTS setups where different trees are managed with a shared selection strategy (tree_id)

Frequently Asked Questions

Add this skill to your agents