Get the FREE Ultimate OpenClaw Setup Guide →

MCPCorpus

MCPCorpus is a comprehensive dataset for analyzing the Model Context Protocol (MCP) ecosystem, containing ~14K MCP servers and 300 MCP clients with 20+ normalized metadata attributes.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio snakinya-mcpcorpus python Website/server.py \
  --env GITHUB_TOKEN="GitHub token (optional, used for enriching metadata during updates)"

How to use

MCPCorpus provides a lightweight local web interface to explore and programmatically access the MCPCorpus dataset, which contains a large collection of MCP servers and clients with rich metadata. To use it, first run the included Python web server that serves the local search interface and dataset files. Once running, open http://localhost:8000 in your browser to search, filter, and browse the stored MCP artifacts. You can also load the underlying JSON datasets directly in your code (located under Crawler/Servers and Crawler/Clients) to perform programmatic analysis or convert them into DataFrames for research workflows. When updating the dataset, you can run the provided data collection scripts to fetch new servers/clients and then refresh the GitHub metadata using the included tooling.

How to install

Prerequisites:

  • Python 3.8+ installed on your machine
  • Access to the repository containing MCPCorpus

Installation steps:

  1. Clone the repository (or ensure you have the MCPCorpus directory structure locally).
  2. Install any Python dependencies if a requirements.txt is present (optional if the server runs with Python standard libraries):
pip install -r requirements.txt
  1. Start the local web server for exploring the dataset:
python Website/server.py
  1. Open http://localhost:8000 in your web browser to interact with the MCPCorpus interface.

Optional: If you intend to update the dataset or collect new metadata, ensure you can run the data collection scripts located under Crawler/Servers and Crawler/Clients, and have a GitHub token if you plan to enrich metadata via github_info_collector.py.

Additional notes

Notes and tips:

  • The dataset and web interface are designed for local exploration; the repository includes scripts to collect server/client data and to enrich metadata via GitHub. If you plan to run the enrichment step, you may need a GitHub token and appropriate API access quotas.
  • Update steps typically involve: (a) collecting new server data, (b) collecting new client data, (c) optionally enriching with GitHub metadata.
  • The environment variable GITHUB_TOKEN is optional but recommended for larger updates to avoid rate limits when querying GitHub during enrichment.
  • The mcp_config shown here assumes the server is started with a Python script at Website/server.py; adjust paths if you relocate the server script.
  • If port 8000 is already in use, you can modify the server script to listen on a different port or run a local proxy as needed.

Related MCP Servers

Sponsor this space

Reach thousands of developers