mcp-evals
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.
claude mcp add --transport stdio mclenhard-mcp-evals node path/to/server.js \ --env OPENAI_API_KEY="Your OpenAI API key" \ --env ANTHROPIC_API_KEY="Your Anthropic API key (if using Anthropic models)"
How to use
MCP Evals is a Node.js package that provides evaluation tooling for MCP tool implementations, with built-in observability support for monitoring and metrics. It enables you to define evaluation configurations (in TypeScript or YAML), run a suite of evaluations against your MCP server, and surface results that help you assess accuracy, completeness, relevance, and clarity of tool outputs. The package supports running evaluations locally via the CLI or integrating them into a GitHub Action workflow to automatically test PR changes and post results back to the PR. Observability features let you instrument your MCP server so you can trace tool calls, measure latency, and capture errors, which can be visualized in a monitoring stack when paired with the provided metrics.
How to install
Prerequisites:
- Node.js (16.x or newer) and npm installed
- Access to an MCP server (local or remote) to evaluate
Install the MCP Evals package:
npm install mcp-evals
If you plan to use the GitHub Action, you can reference the action in your workflow (see the README for the exact workflow snippet).
Additional notes
Tips and common considerations:
- Environment variables: ensure OPENAI_API_KEY (and ANTHROPIC_API_KEY if using Anthropic models) are available in the environment where evaluations run.
- Evaluation configurations: you can author evals in TypeScript (exporting EvalConfig and EvalFunction) or in YAML for simpler setups.
- If you enable monitoring, you will need a running observability stack (Prometheus, Grafana, Jaeger) as described in the Monitoring section to visualize dashboards.
- The metrics feature is in alpha; expect API changes and possible breakages as the project iterates.
- For GitHub Actions, ensure your workflow has access to secrets and that Node.js is set up in the runner before invoking the evaluation action.
Related MCP Servers
CanvasMCPClient
Canvas MCP Client is an open-source, self-hostable dashboard application built around an infinite, zoomable, and pannable canvas. It provides a unified interface for interacting with multiple MCP (Model Context Protocol) servers through a flexible, widget-based system.
docmole
Dig through any documentation with AI - MCP server for Claude, Cursor, and other AI assistants
obsidian
MCP server for Obsidian vault management - enables Claude and other AI assistants to read, write, search, and organize your notes
GameMaker
GameMaker MCP server for Cursor - Build GM projects with AI
xgmem
Global Memory MCP server, that manage all projects data.
mcp-turso
MCP server for interacting with Turso-hosted LibSQL databases