Get the FREE Ultimate OpenClaw Setup Guide →

MCPToolBenchPP

MCPToolBench++ MCP Model Context Protocol Tool Use Benchmark on AI Agent and Model Tool Use Ability

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio mcp-tool-bench-mcptoolbenchpp python -m mcp_tool_bench_pp

How to use

MCPToolBenchPP is a large-scale AI Agent Tool Use Benchmark that aggregates thousands of MCP servers across multiple domains to evaluate tool-use capabilities of AI agents. It provides a structured collection of MCP servers and evaluation workflows, enabling researchers to test how well agents select, call, and chain tools to complete tasks. To use it, set up the MCPToolBenchPP server bundle in your environment, then leverage the included evaluation scripts to run benchmark tasks across Browser, File System, Search, Map, Pay, and other domains. The benchmark focuses on real-world tool use, including single-step and multi-step tool invocation scenarios, and can be extended with additional domain datasets. Start the server(s) and run the provided dataset evaluation pipeline to measure metrics like success rate and tool-use efficiency across categories.

How to install

Prerequisites:

  • Python 3.8+ (recommended 3.9+)
  • Git
  • A suitable environment (virtualenv or conda)
  1. Clone the repository or obtain the MCPToolBenchPP package: git clone https://github.com/mcp-tool-bench/MCPToolBenchPP.git cd MCPToolBenchPP

  2. Create and activate a virtual environment: python -m venv venv source venv/bin/activate # Linux/macOS venv\Scripts\activate.bat # Windows

  3. Install dependencies (adjust if a requirements file exists): pip install -r requirements.txt

  4. Verify installation by listing available modules or running a quick help: python -m mcp_tool_bench_pp --help

  5. Run the benchmark server (example): python -m mcp_tool_bench_pp

Note: If the project provides its own setup or packaging scripts, prefer those commands as documented in the repository. The MCPToolBenchPP dataset is a benchmark collection rather than a single application, so you may also need to configure data paths and environment variables as described in the repository documentation.

Additional notes

Tips and caveats:

  • This project is described as still WIP with ongoing updates and new domain datasets, so expect changes and additions over time.
  • If you encounter network or resource limitations, ensure your environment has sufficient CPU/GPU, memory, and disk space for large-scale tool-use benchmarks.
  • Environment variables or config files may be used to tailor the evaluation (e.g., specifying dataset paths, model names, or evaluation settings). Check the docs for any required vars like DATA_PATH, MODEL_NAME, or EVAL_CONFIG.
  • If you run into compatibility issues, consider using a clean virtual environment and pinning dependency versions.
  • For reproducibility, document the exact data version and server configuration used for each run, as benchmark results can vary with dataset version.

Related MCP Servers

Sponsor this space

Reach thousands of developers