mcp-evals

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio mclenhard-mcp-evals node path/to/server.js \
  --env OPENAI_API_KEY="Your OpenAI API key" \
  --env ANTHROPIC_API_KEY="Your Anthropic API key (if using Anthropic models)"

How to use

MCP Evals is a Node.js package that provides evaluation tooling for MCP tool implementations, with built-in observability support for monitoring and metrics. It enables you to define evaluation configurations (in TypeScript or YAML), run a suite of evaluations against your MCP server, and surface results that help you assess accuracy, completeness, relevance, and clarity of tool outputs. The package supports running evaluations locally via the CLI or integrating them into a GitHub Action workflow to automatically test PR changes and post results back to the PR. Observability features let you instrument your MCP server so you can trace tool calls, measure latency, and capture errors, which can be visualized in a monitoring stack when paired with the provided metrics.

How to install

Prerequisites:

Node.js (16.x or newer) and npm installed
Access to an MCP server (local or remote) to evaluate

Install the MCP Evals package:

npm install mcp-evals

If you plan to use the GitHub Action, you can reference the action in your workflow (see the README for the exact workflow snippet).

Additional notes

Tips and common considerations:

Environment variables: ensure OPENAI_API_KEY (and ANTHROPIC_API_KEY if using Anthropic models) are available in the environment where evaluations run.
Evaluation configurations: you can author evals in TypeScript (exporting EvalConfig and EvalFunction) or in YAML for simpler setups.
If you enable monitoring, you will need a running observability stack (Prometheus, Grafana, Jaeger) as described in the Monitoring section to visualize dashboards.
The metrics feature is in alpha; expect API changes and possible breakages as the project iterates.
For GitHub Actions, ensure your workflow has access to secrets and that Node.js is set up in the runner before invoking the evaluation action.

Related MCP Servers

CanvasMCPClient

Canvas MCP Client is an open-source, self-hostable dashboard application built around an infinite, zoomable, and pannable canvas. It provides a unified interface for interacting with multiple MCP (Model Context Protocol) servers through a flexible, widget-based system.

docmole

Dig through any documentation with AI - MCP server for Claude, Cursor, and other AI assistants

obsidian

MCP server for Obsidian vault management - enables Claude and other AI assistants to read, write, search, and organize your notes

GameMaker

GameMaker MCP server for Cursor - Build GM projects with AI

xgmem

Global Memory MCP server, that manage all projects data.

mcp-turso

MCP server for interacting with Turso-hosted LibSQL databases