MCPBench

The evaluation benchmark on MCP servers

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio modelscope-mcpbench npx -y firecrawl-mcp \
  --env FIRECRAWL_API_KEY="your-firecrawl-api-key"

How to use

MCPBench is an evaluation framework for MCP Servers. It enables you to benchmark different MCP servers (for example, web search, database query, and GAIA tasks) under the same LLM and agent configurations. The framework supports both remote MCP servers accessible via SSE (such as ModelScope or Smithery) and local MCP server instances started on your machine. To use MCPBench, configure a set of MCP servers you want to evaluate, then run the provided evaluation scripts to measure metrics like task completion accuracy, latency, and token consumption. The tool automatically detects server capabilities and parameters from the server configuration, so you can mix remote endpoints with locally launched servers in a single evaluation run.

How to install

Prerequisites:

Python 3.11 or newer
Node.js and jq installed on your system
Conda (optional but recommended for Python environments)

Installation steps:

Create and activate a Python environment (recommended):

conda create -n mcpbench python=3.11 -y
conda activate mcpbench

Install Python dependencies:

pip install -r requirements.txt

Ensure Node.js is installed (for any Node-based MCP servers you may run locally) and jq is available on your system.
(Optional) If you plan to run a local MCP server via npx, ensure you have npm/yarn configured and internet access to fetch the MCP server package.

Additional notes

Tips and notes:

The framework supports evaluating remote MCP servers via SSE as well as locally launched servers. When running locally, you can specify commands like npx -y <mcp-package> in the config.
If you encounter connectivity or authentication issues with a remote MCP server, verify network access and ensure any required API keys are set in environment variables or config files.
For local runs, you may need to provide API keys or tokens (e.g., FIRECRAWL_API_KEY) as part of the server's environment variables in the mcp_config.
The evaluation scripts (sh evaluation_websearch.sh, evaluation_db.sh, evaluation_gaia.sh) expect a config file path; ensure the path is correct and the config contains valid server entries.
This MCPBench example uses a local FireCrawl MCP server as a sample. Replace or extend the mcpServers section with other servers you want to benchmark.