A Monte Carlo Tree Search that uses a large language model as the world model and policy to simulate outcomes and guide search.

How do I control resource usage?

Set a clear iteration budget, monitor convergence, and prune or stop early if the best path stabilizes.

How should results be presented?

Provide the best path with confidence scores and a transparent reasoning chain, plus notes on observed learnings and updated beliefs.

mcts

npx machina-cli add skill NewJerseyStyle/plugin-mcts/mcts --openclaw

Files (1)

SKILL.md

3.0 KB

MCTS-LLM Problem Solver

You are executing the MCTS-LLM algorithm to solve the user's request through systematic exploration.

Overview

MCTS-LLM combines Monte Carlo Tree Search with LLM capabilities:

World Model: Use LLM to predict outcomes and simulate actions
Heuristic Policy: Use LLM to guide promising search directions
Tree Search: Systematically explore and evaluate solution paths

Algorithm Steps

For the request: $ARGUMENTS

Phase 1: Initialize

Parse the problem - Understand the user's request
Define the state space - What are the possible states/solutions?
Define actions - What moves/decisions can be made?
Initialize root node - Create the starting state

Use the MCP tool mcts_init_tree to initialize the search tree.

Phase 2: MCTS Loop (repeat until solution found or budget exhausted)

Execute the four MCTS phases in order:

2.1 Selection (UCB1)

Use /mcts:mcts-select skill or mcts_select MCP tool to:

Traverse from root using UCB1: UCB = Q/N + c * sqrt(ln(parent_N) / N)
Balance exploitation (high Q/N) vs exploration (low N)
Select the most promising leaf node

2.2 Expansion

Use /mcts:mcts-expand skill or mcts_expand MCP tool to:

Generate possible actions from selected node
Use LLM as world model to predict plausible next states
Add new child nodes to the tree

2.3 Simulation (Rollout)

Use /mcts:mcts-simulate skill or mcts_simulate MCP tool to:

From expanded node, simulate to terminal state
Use LLM as policy to guide simulation
Evaluate the outcome (success/failure/partial)

2.4 Backpropagation

Use /mcts:mcts-backpropagate skill or mcts_backpropagate MCP tool to:

Update statistics from simulated node to root
Increment visit counts (N)
Update value estimates (Q)

Phase 3: Solution Extraction

After sufficient iterations:

Use mcts_get_best_path to extract the best solution path
Present the solution with confidence scores
Explain the reasoning chain

Beliefs and Observations

Throughout the search:

Use mcts_add_observation to record what you learn
Use mcts_update_belief to update probability estimates
Use mcts_get_beliefs to check current understanding

Prompt Dataset

Access reusable prompts with:

mcts_dataset_list - View available prompts
mcts_dataset_get - Retrieve a specific prompt
Use /mcts:mcts-dataset for full CRUD operations

Execution Strategy

Start with a reasonable iteration budget (e.g., 10-50 iterations)
Monitor convergence - if best path stabilizes, can stop early
Use observations to refine the search space
Present intermediate progress for complex problems

Now execute MCTS for the given problem, using the appropriate MCP tools and skills.

Source

git clone https://github.com/NewJerseyStyle/plugin-mcts/blob/main/skills/mcts/SKILL.mdView on GitHub

Overview

MCTS-LLM blends Monte Carlo Tree Search with LLM capabilities to explore solution spaces through world modeling, heuristic guidance, and tree search. It’s designed for research questions, planning tasks, and coding challenges that benefit from iterative exploration and learning.

How This Skill Works

Initialize the problem by parsing the request, defining the state space, and identifying actions, then build the root node with mcts_init_tree. Run the MCTS loop: select a promising leaf with UCB1 via mcts_select, expand using mcts_expand guided by the LLM world model, simulate with mcts_simulate guided by the policy, and backpropagate results with mcts_backpropagate. After enough iterations, extract the best path with mcts_get_best_path and present it with confidence scores and reasoning.

When to Use It

Solving a research question that requires exploring a large hypothesis space.
Planning tasks with complex dependencies and uncertain outcomes.
Coding challenges that benefit from iterative design exploration.
Strategy or optimization problems with a large branching factor.
Designing experiments or prompts where beliefs must be updated as you learn.

Quick Start

Step 1: Parse the problem, define state space and actions, and initialize the root with mcts_init_tree.
Step 2: Run the MCTS loop for a chosen budget (e.g., 10–50 iterations) using mcts_select, mcts_expand, mcts_simulate, and mcts_backpropagate.
Step 3: Extract the best path with mcts_get_best_path, present the solution with confidence scores, and explain the reasoning chain.

Best Practices

Define a precise state space and clear actions before starting.
Tune the UCB1 exploration constant (c) to balance exploration and exploitation.
Leverage the LLM as world model for expansion and rollout steps.
Set a realistic iteration budget and monitor convergence to stop early if stable.
Record observations and update beliefs to refine the search space.

Example Use Cases

Architecting a software design by systematically exploring alternatives.
Answering a research question through iterative hypothesis testing and refinement.
Optimizing a complex algorithmic strategy via simulated rollouts.
Automatically generating robust test cases by exploring various input scenarios.
Generating and evaluating multi-step prompts for a challenging task.

Frequently Asked Questions

Add this skill to your agents

mcts

MCTS-LLM Problem Solver

Overview

Algorithm Steps

Phase 1: Initialize

Phase 2: MCTS Loop (repeat until solution found or budget exhausted)

2.1 Selection (UCB1)

2.2 Expansion

2.3 Simulation (Rollout)

2.4 Backpropagation

Phase 3: Solution Extraction

Beliefs and Observations

Prompt Dataset

Execution Strategy

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What is MCTS-LLM?

How do I control resource usage?

How should results be presented?