Get the FREE Ultimate OpenClaw Setup Guide →

MCPDocSearch

This project provides a toolset to crawl websites wikis, tool/library documentions and generate Markdown documentation, and make that documentation searchable via a Model Context Protocol (MCP) server, designed for integration with tools like Cursor.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio alizdavoodi-mcpdocsearch uv --directory /path/to/your/MCPDocSearch run python -m mcp_server.main

How to use

MCPDocSearch provides a documentation crawling workflow combined with an MCP server that serves semantic search over the crawled Markdown content. Start by crawling a site to produce Markdown docs stored under ./storage, then run the MCP server to load, chunk, and embed those documents so clients can query them via Cursor or other MCP-compatible tools. The server exposes key MCP tools: list_documents to enumerate crawled docs, get_document_headings to retrieve the heading structure for a document, and search_documentation to perform semantic search across content chunks. The server uses a cache file at storage/document_chunks_cache.pkl to speed up startup on subsequent runs, invalidating automatically whenever a Markdown file changes. When used with Cursor through the stdio transport, you’ll typically start the server from your project root and connect the Cursor agent to issue the MCP tool commands.

To integrate Cursor, add a Cursor-compatible config (e.g., .cursor/mcp.json) that launches uv to run python -m mcp_server.main from the project root, ensuring the absolute path to MCPDocSearch is provided. Once started, you can issue semantic search requests against the embedded documentation and retrieve relevant chunks with their headings for context.

How to install

Prerequisites

  • Python 3.8+ (with uv installed for dependency management)
  • uv (as a dependency management tool) installed on your system
  • Git

Installation steps

  1. Clone the repository
git clone https://github.com/alizdavoodi/MCPDocSearch.git
cd MCPDocSearch
  1. Install and build dependencies using uv
uv sync

This creates a virtual environment (commonly .venv) and installs all dependencies from pyproject.toml.

  1. Verify installation
uv run python -V
  1. Prepare for crawling or running the MCP server
  • Ensure you have write access to ./storage for Markdown outputs and the cache file.
  • Optionally install a CUDA-enabled runtime for faster embeddings if available.

Note: The initial embedding step (first run after crawling) may take several minutes depending on data size and hardware.

Additional notes

Tips and notes:

  • Embedding time: The first run or changes to Markdown files in ./storage will trigger embedding generation using sentence-transformers. On slower CPUs, this can take minutes; subsequent runs will be faster thanks to the cache at storage/document_chunks_cache.pkl.
  • Cache invalidation: Any change to .md files in ./storage will invalidate and regenerate the cache automatically upon next startup.
  • Storage layout: All crawled Markdown files live under ./storage. The default output filename convention is derived from the source URL, e.g., ./storage/docs.example.com.md.
  • Cursor integration: Use a .cursor/mcp.json file configured to launch the MCP server via uv as shown in the README to enable handheld or automated querying from Cursor.
  • Troubleshooting: If embedding fails due to missing model weights, ensure network access is available for model download or provide local model caches as supported by sentence-transformers.

Related MCP Servers

Sponsor this space

Reach thousands of developers