MCPDocSearch
This project provides a toolset to crawl websites wikis, tool/library documentions and generate Markdown documentation, and make that documentation searchable via a Model Context Protocol (MCP) server, designed for integration with tools like Cursor.
claude mcp add --transport stdio alizdavoodi-mcpdocsearch uv --directory /path/to/your/MCPDocSearch run python -m mcp_server.main
How to use
MCPDocSearch provides a documentation crawling workflow combined with an MCP server that serves semantic search over the crawled Markdown content. Start by crawling a site to produce Markdown docs stored under ./storage, then run the MCP server to load, chunk, and embed those documents so clients can query them via Cursor or other MCP-compatible tools. The server exposes key MCP tools: list_documents to enumerate crawled docs, get_document_headings to retrieve the heading structure for a document, and search_documentation to perform semantic search across content chunks. The server uses a cache file at storage/document_chunks_cache.pkl to speed up startup on subsequent runs, invalidating automatically whenever a Markdown file changes. When used with Cursor through the stdio transport, you’ll typically start the server from your project root and connect the Cursor agent to issue the MCP tool commands.
To integrate Cursor, add a Cursor-compatible config (e.g., .cursor/mcp.json) that launches uv to run python -m mcp_server.main from the project root, ensuring the absolute path to MCPDocSearch is provided. Once started, you can issue semantic search requests against the embedded documentation and retrieve relevant chunks with their headings for context.
How to install
Prerequisites
- Python 3.8+ (with uv installed for dependency management)
- uv (as a dependency management tool) installed on your system
- Git
Installation steps
- Clone the repository
git clone https://github.com/alizdavoodi/MCPDocSearch.git
cd MCPDocSearch
- Install and build dependencies using uv
uv sync
This creates a virtual environment (commonly .venv) and installs all dependencies from pyproject.toml.
- Verify installation
uv run python -V
- Prepare for crawling or running the MCP server
- Ensure you have write access to ./storage for Markdown outputs and the cache file.
- Optionally install a CUDA-enabled runtime for faster embeddings if available.
Note: The initial embedding step (first run after crawling) may take several minutes depending on data size and hardware.
Additional notes
Tips and notes:
- Embedding time: The first run or changes to Markdown files in ./storage will trigger embedding generation using sentence-transformers. On slower CPUs, this can take minutes; subsequent runs will be faster thanks to the cache at storage/document_chunks_cache.pkl.
- Cache invalidation: Any change to .md files in ./storage will invalidate and regenerate the cache automatically upon next startup.
- Storage layout: All crawled Markdown files live under ./storage. The default output filename convention is derived from the source URL, e.g., ./storage/docs.example.com.md.
- Cursor integration: Use a .cursor/mcp.json file configured to launch the MCP server via uv as shown in the README to enable handheld or automated querying from Cursor.
- Troubleshooting: If embedding fails due to missing model weights, ensure network access is available for model download or provide local model caches as supported by sentence-transformers.
Related MCP Servers
web-eval-agent
An MCP server that autonomously evaluates web applications.
mcp-neo4j
Neo4j Labs Model Context Protocol servers
Gitingest
mcp server for gitingest
zotero
Model Context Protocol (MCP) server for the Zotero API, in Python
fhir
FHIR MCP Server – helping you expose any FHIR Server or API as a MCP Server.
unitree-go2
The Unitree Go2 MCP Server is a server built on the MCP that enables users to control the Unitree Go2 robot using natural language commands interpreted by a LLM.