doc-indexer
Local document indexer MCP server for semantic search over PDF, Excel, SQL, Markdown, and HTML files using Qdrant and Voyage AI embeddings.
claude mcp add --transport stdio eztakesin-doc-indexer-mcp cargo run --release \ --env RUST_LOG="info" \ --env DOCS_PATH="/path/to/your/documents" \ --env QDRANT_URL="http://localhost:6334" \ --env INDEX_SUBDIRS="docs" \ --env PDF_CHUNK_SIZE="1000" \ --env VOYAGE_API_KEY="your-voyage-api-key" \ --env EMBEDDING_MODEL="voyage-3-large" \ --env PDF_CHUNK_OVERLAP="200" \ --env QDRANT_COLLECTION="doc_index" \ --env SQL_MAX_CHUNK_SIZE="4000" \ --env EXCEL_ROWS_PER_CHUNK="50"
How to use
doc-indexer is a local MCP server written in Rust that enables semantic search over a variety of document types using a Qdrant vector store and Voyage AI embeddings. It exposes MCP tools to index documents or directories, perform semantic searches across the indexed content, delete documents, and retrieve statistics about the index. The server integrates with Claude Code CLI and other MCP-compatible clients, allowing you to manage documents and queries through MCP tools like index_document, index_directory, search_documents, delete_document, and get_stats. To use it, configure your MCP client to point at this server, supply the necessary environment variables (authentication with Voyage, embedding model, and Qdrant access), and then issue the available MCP tool commands to index and search your documents.
How to install
Prerequisites:
- Rust 2024 Edition (rustc 1.85+) installed on your system
- Qdrant vector database up and running (server or docker) at the configured URL
- poppler-utils (pdftotext) available for PDF extraction
- Voyage AI API key (or an OpenAI-compatible embeddings endpoint)
Install steps:
-
Install Rust and set up the environment
- On macOS with Homebrew: brew install rustup; rustup default stable
- On Linux: follow rustup.rs installer from https://rust-lang.org
-
Install dependencies for PDF parsing (pdftotext via poppler)
- macOS: brew install poppler
- Linux: sudo apt-get install poppler-utils
-
Ensure Qdrant is running
- Run locally: qdrant (or use the provided binary/docker image)
- Verify accessible at http://localhost:6334 by default
-
Build the MCP server
- git clone <repo-url> # if you haven’t already
- cd doc-indexer-mcp
- cargo build --release
-
Prepare environment variables
- Create a .env file or export variables as needed (see example below)
-
Run the server
- Run the server with the release binary (configured for your MCP client): cargo run --release
Note: The server reads configuration from environment variables as described in the Configuration section. You can also adapt the setup to run with your preferred environment (Docker, npx, etc.) if you port the config accordingly.
Additional notes
Tips and common issues:
- Ensure Qdrant is reachable at the configured QDRANT_URL and that the collection (QDRANT_COLLECTION) exists or can be created by the server.
- The embedding API key (VOYAGE_API_KEY) must be valid and have access to the Voyage AI embeddings endpoint; if using an OpenAI-compatible endpoint, confirm compatibility.
- PDF parsing relies on pdftotext; ensure it is installed and accessible in your PATH.
- If you change DOCS_PATH or INDEX_SUBDIRS, make sure filesystem permissions allow read access to the documents.
- For large document sets, adjust PDF_CHUNK_SIZE and OVERLAP to optimize indexing performance and memory usage.
- The MCP server uses stdio for MCP communication; ensure your MCP client is configured to communicate over the correct channel.
- If you deploy locally and want to test Claude Code integration, ensure the Claude Code CLI configuration includes the doc-indexer MCP server with the appropriate command path and environment variables.
Related MCP Servers
VectorCode
A code repository indexing tool to supercharge your LLM experience.
mcp-pinecone
Model Context Protocol server to allow for reading and writing from Pinecone. Rudimentary RAG
pluggedin-app
The Crossroads for AI Data Exchanges. A unified, self-hostable web interface for discovering, configuring, and managing Model Context Protocol (MCP) servers—bringing together AI tools, workspaces, prompts, and logs from multiple MCP sources (Claude, Cursor, etc.) under one roof.
Archive-Agent
Find your files with natural language and ask questions.
nautobot_mcp
Nautobot Model Context Protocol (MCP) Server - Contains STDIO and HTTP Deployments with Embedding Search and RAG.
mcp-tidy
CLI tool to visualize and manage MCP server configurations in Claude Code. List servers, analyze usage statistics, and clean up unused servers