MCPToolBenchPP

MCPToolBench++ MCP Model Context Protocol Tool Use Benchmark on AI Agent and Model Tool Use Ability

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio mcp-tool-bench-mcptoolbenchpp python -m mcp_tool_bench_pp

How to use

MCPToolBenchPP is a large-scale AI Agent Tool Use Benchmark that aggregates thousands of MCP servers across multiple domains to evaluate tool-use capabilities of AI agents. It provides a structured collection of MCP servers and evaluation workflows, enabling researchers to test how well agents select, call, and chain tools to complete tasks. To use it, set up the MCPToolBenchPP server bundle in your environment, then leverage the included evaluation scripts to run benchmark tasks across Browser, File System, Search, Map, Pay, and other domains. The benchmark focuses on real-world tool use, including single-step and multi-step tool invocation scenarios, and can be extended with additional domain datasets. Start the server(s) and run the provided dataset evaluation pipeline to measure metrics like success rate and tool-use efficiency across categories.

How to install

Prerequisites:

Python 3.8+ (recommended 3.9+)
Git
A suitable environment (virtualenv or conda)

Clone the repository or obtain the MCPToolBenchPP package: git clone https://github.com/mcp-tool-bench/MCPToolBenchPP.git cd MCPToolBenchPP
Create and activate a virtual environment: python -m venv venv source venv/bin/activate # Linux/macOS venv\Scripts\activate.bat # Windows
Install dependencies (adjust if a requirements file exists): pip install -r requirements.txt
Verify installation by listing available modules or running a quick help: python -m mcp_tool_bench_pp --help
Run the benchmark server (example): python -m mcp_tool_bench_pp

Note: If the project provides its own setup or packaging scripts, prefer those commands as documented in the repository. The MCPToolBenchPP dataset is a benchmark collection rather than a single application, so you may also need to configure data paths and environment variables as described in the repository documentation.

Additional notes

Tips and caveats:

This project is described as still WIP with ongoing updates and new domain datasets, so expect changes and additions over time.
If you encounter network or resource limitations, ensure your environment has sufficient CPU/GPU, memory, and disk space for large-scale tool-use benchmarks.
Environment variables or config files may be used to tailor the evaluation (e.g., specifying dataset paths, model names, or evaluation settings). Check the docs for any required vars like DATA_PATH, MODEL_NAME, or EVAL_CONFIG.
If you run into compatibility issues, consider using a clean virtual environment and pinning dependency versions.
For reproducibility, document the exact data version and server configuration used for each run, as benchmark results can vary with dataset version.

Related MCP Servers

python-utcp

636

Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, without extra middleware.

skillport

327

Bring Agent Skills to Any AI Agent and Coding Agent — via CLI or MCP. Manage once, serve anywhere.

MCPSecBench

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

AI-SOC-Agent

Blackhat 2025 presentation and codebase: AI SOC agent & MCP server for automated security investigation, alert triage, and incident response. Integrates with ELK, IRIS, and other platforms.

jmcomic-ai

AI 原生 JMComic 助手：通过 MCP 与 Skills 将 JMComic 注入你的 AI Agent. / AI-powered JMComic assistant for seamless integration with AI Agents via MCP & Skills.

alris

Alris is an AI automation tool that transforms natural language commands into task execution.