MCPDocSearch

This project provides a toolset to crawl websites wikis, tool/library documentions and generate Markdown documentation, and make that documentation searchable via a Model Context Protocol (MCP) server, designed for integration with tools like Cursor.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio alizdavoodi-mcpdocsearch uv --directory /path/to/your/MCPDocSearch run python -m mcp_server.main

How to use

MCPDocSearch provides a documentation crawling workflow combined with an MCP server that serves semantic search over the crawled Markdown content. Start by crawling a site to produce Markdown docs stored under ./storage, then run the MCP server to load, chunk, and embed those documents so clients can query them via Cursor or other MCP-compatible tools. The server exposes key MCP tools: list_documents to enumerate crawled docs, get_document_headings to retrieve the heading structure for a document, and search_documentation to perform semantic search across content chunks. The server uses a cache file at storage/document_chunks_cache.pkl to speed up startup on subsequent runs, invalidating automatically whenever a Markdown file changes. When used with Cursor through the stdio transport, you’ll typically start the server from your project root and connect the Cursor agent to issue the MCP tool commands.

To integrate Cursor, add a Cursor-compatible config (e.g., .cursor/mcp.json) that launches uv to run python -m mcp_server.main from the project root, ensuring the absolute path to MCPDocSearch is provided. Once started, you can issue semantic search requests against the embedded documentation and retrieve relevant chunks with their headings for context.

How to install

Prerequisites

Python 3.8+ (with uv installed for dependency management)
uv (as a dependency management tool) installed on your system
Git

Installation steps

Clone the repository

git clone https://github.com/alizdavoodi/MCPDocSearch.git
cd MCPDocSearch

Install and build dependencies using uv

uv sync

This creates a virtual environment (commonly .venv) and installs all dependencies from pyproject.toml.

Verify installation

uv run python -V

Prepare for crawling or running the MCP server

Ensure you have write access to ./storage for Markdown outputs and the cache file.
Optionally install a CUDA-enabled runtime for faster embeddings if available.

Note: The initial embedding step (first run after crawling) may take several minutes depending on data size and hardware.

Additional notes

Tips and notes:

Embedding time: The first run or changes to Markdown files in ./storage will trigger embedding generation using sentence-transformers. On slower CPUs, this can take minutes; subsequent runs will be faster thanks to the cache at storage/document_chunks_cache.pkl.
Cache invalidation: Any change to .md files in ./storage will invalidate and regenerate the cache automatically upon next startup.
Storage layout: All crawled Markdown files live under ./storage. The default output filename convention is derived from the source URL, e.g., ./storage/docs.example.com.md.
Cursor integration: Use a .cursor/mcp.json file configured to launch the MCP server via uv as shown in the README to enable handheld or automated querying from Cursor.
Troubleshooting: If embedding fails due to missing model weights, ensure network access is available for model download or provide local model caches as supported by sentence-transformers.

Related MCP Servers

web-eval-agent

1.2k

An MCP server that autonomously evaluates web applications.

mcp-neo4j

911

Neo4j Labs Model Context Protocol servers

Gitingest

136

mcp server for gitingest

zotero

132

Model Context Protocol (MCP) server for the Zotero API, in Python

fhir

FHIR MCP Server – helping you expose any FHIR Server or API as a MCP Server.

unitree-go2

The Unitree Go2 MCP Server is a server built on the MCP that enables users to control the Unitree Go2 robot using natural language commands interpreted by a LLM.