Get the FREE Ultimate OpenClaw Setup Guide →

mcts-select

Scanned
npx machina-cli add skill NewJerseyStyle/plugin-mcts/mcts-select --openclaw
Files (1)
SKILL.md
1.7 KB

MCTS Selection Phase

You are executing the SELECTION phase of Monte Carlo Tree Search.

UCB1 Formula

For each node, calculate:

UCB = Q/N + c * sqrt(ln(parent_N) / N)

Where:

  • Q: Total value/reward accumulated at this node
  • N: Number of visits to this node
  • parent_N: Number of visits to parent node
  • c: Exploration constant (typically sqrt(2) ≈ 1.414)

Selection Algorithm

  1. Start at root node
  2. While current node is fully expanded and not terminal:
    • Calculate UCB for all children
    • Select child with highest UCB value
    • Move to selected child
  3. Return the selected leaf node

Using MCP Tools

Call mcts_select with optional parameters:

  • exploration_constant: Value for c (default: 1.414)
  • tree_id: If managing multiple trees

The tool returns:

  • selected_node_id: The ID of the selected node
  • path: The path from root to selected node
  • node_state: The state at the selected node
  • is_terminal: Whether this is a terminal state
  • ucb_scores: UCB scores for nodes along the path

Selection Strategy

For the current problem context: $ARGUMENTS

  1. Check if any nodes are unexplored (N=0) - these get priority
  2. Among explored nodes, balance:
    • Exploitation: Nodes with high average reward (Q/N)
    • Exploration: Nodes visited less frequently
  3. Consider domain-specific heuristics from observations

Output

After selection, report:

  1. Selected node ID and state
  2. Path taken from root
  3. UCB reasoning for the selection
  4. Whether expansion is needed (if node has unexplored children)

Proceed to EXPANSION phase with the selected node.

Source

git clone https://github.com/NewJerseyStyle/plugin-mcts/blob/main/skills/mcts-select/SKILL.mdView on GitHub

Overview

Implements the Selection phase of Monte Carlo Tree Search using the UCB1 score to traverse from the root to a promising leaf. It prioritizes unexplored nodes (N=0), balances exploitation and exploration, and reports the path, node state, and UCB scores for debugging. The process prepares the next phase (Expansion) by returning the chosen leaf and its context.

How This Skill Works

Start at the root and repeatedly evaluate the UCB1 score for each child using Q/N + c * sqrt(ln(parent_N)/N). Move to the child with the highest score while the current node is fully expanded and non-terminal. If any node has N=0, those are prioritized. Return the selected leaf along with the path, node state, terminal status, and the UCB scores used for the decision.

When to Use It

  • During a single MCTS iteration to identify the leaf node for expansion
  • When balancing high-reward nodes against rarely visited ones
  • When you want to inspect or debug the path and UCB reasoning with path scores
  • When using multi-tree setups via tree_id to manage several MCTS trees in parallel
  • When a domain heuristic should influence the selection amidst UCB-based decisions

Quick Start

  1. Step 1: Call mcts_select with the current root and optional parameters (exploration_constant, tree_id).
  2. Step 2: While the current node is fully expanded and non-terminal, compute UCB for all children and pick the highest; move to that child.
  3. Step 3: Return and inspect selected_node_id, path, node_state, is_terminal, and ucb_scores to decide on Expansion or rollout.

Best Practices

  • Prioritize unexplored nodes (N=0) before expanding already-visited children
  • Use the standard UCB1 formula: UCB = Q/N + c * sqrt(ln(parent_N) / N)
  • Set an appropriate exploration constant c (default ~1.414) and keep it consistent
  • Record and review the path and ucb_scores to diagnose poor expansions
  • Proceed to Expansion only after a leaf is selected or a stopping condition is met

Example Use Cases

  • AI game agent selecting the next move in a board game by traversing from the root to a leaf using UCB1
  • Robotic path planning where the agent chooses the next waypoint based on visitation and reward
  • Resource allocation in a strategy game where balance between known good moves and exploration is needed
  • Puzzle-solving or planning tasks where the search tree is large and parallel simulations help
  • Multi-tree MCTS setups where different trees are managed with a shared selection strategy (tree_id)

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers