Get the FREE Ultimate OpenClaw Setup Guide →

mcp-evals

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio mclenhard-mcp-evals node path/to/server.js \
  --env OPENAI_API_KEY="Your OpenAI API key" \
  --env ANTHROPIC_API_KEY="Your Anthropic API key (if using Anthropic models)"

How to use

MCP Evals is a Node.js package that provides evaluation tooling for MCP tool implementations, with built-in observability support for monitoring and metrics. It enables you to define evaluation configurations (in TypeScript or YAML), run a suite of evaluations against your MCP server, and surface results that help you assess accuracy, completeness, relevance, and clarity of tool outputs. The package supports running evaluations locally via the CLI or integrating them into a GitHub Action workflow to automatically test PR changes and post results back to the PR. Observability features let you instrument your MCP server so you can trace tool calls, measure latency, and capture errors, which can be visualized in a monitoring stack when paired with the provided metrics.

How to install

Prerequisites:

  • Node.js (16.x or newer) and npm installed
  • Access to an MCP server (local or remote) to evaluate

Install the MCP Evals package:

npm install mcp-evals

If you plan to use the GitHub Action, you can reference the action in your workflow (see the README for the exact workflow snippet).

Additional notes

Tips and common considerations:

  • Environment variables: ensure OPENAI_API_KEY (and ANTHROPIC_API_KEY if using Anthropic models) are available in the environment where evaluations run.
  • Evaluation configurations: you can author evals in TypeScript (exporting EvalConfig and EvalFunction) or in YAML for simpler setups.
  • If you enable monitoring, you will need a running observability stack (Prometheus, Grafana, Jaeger) as described in the Monitoring section to visualize dashboards.
  • The metrics feature is in alpha; expect API changes and possible breakages as the project iterates.
  • For GitHub Actions, ensure your workflow has access to secrets and that Node.js is set up in the runner before invoking the evaluation action.

Related MCP Servers

Sponsor this space

Reach thousands of developers