MCP
This MCP server lets AI assistants access and search your private documents, codebases, and latest tech info. It processes Markdown, text, and PDFs into a searchable database, extending AI knowledge beyond training data. Built with Docker, supports free and paid embeddings, and keeps AI updated with your data.
claude mcp add --transport stdio donphi-mcp-server docker run -i donphi/mcp-server \ --env DB_PATH="/db" \ --env DATA_DIR="/data" \ --env BATCH_SIZE="10" \ --env CHUNK_SIZE="800" \ --env OUTPUT_DIR="/output" \ --env CONFIG_PATH="/config/server_config.json" \ --env MAX_RESULTS="10" \ --env CLAUDE_MODEL="claude-3-7-sonnet-20240307" \ --env CHUNK_OVERLAP="120" \ --env USE_ANTHROPIC="true" \ --env OPENAI_API_KEY="your_openai_api_key_here (optional - can use local embeddings if omitted)" \ --env EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2" \ --env ANTHROPIC_API_KEY="your_anthropic_api_key_here (optional)" \ --env SUPPORTED_EXTENSIONS=".md,.txt,.pdf,.docx,.doc"
How to use
This MCP server exposes your processed document content via the MCP interface so that AI assistants can query and retrieve information from your private data. It is designed to work with an MCP-compatible assistant and can point to a local vector database containing embeddings of your Markdown and text files. Use the provided tooling to build and run the pipeline that ingests your data, then spin up the server to serve search and retrieval results to an assistant. The server supports standard MCP queries to fetch relevant passages or summaries from your data sources, enabling up-to-date documentation lookup, private codebase understanding, and technical specification retrieval within conversational agents.
How to install
Prerequisites:
- Docker Desktop (Windows/macOS) or Docker Engine (Linux)
- Git installed
- Access to the repository you will run (clone from GitHub)
Install steps:
-
Clone the repository: git clone https://github.com/donphi/mcp-server.git cd mcp-server
-
Create and configure environment: cp .env.example .env # edit with your settings
Edit .env to set OPENAI_API_KEY, ANTHROPIC_API_KEY, data/output/db paths, and server options
-
Prepare data:
- Place your Markdown (.md) and text (.txt) files in the data/ directory
- Ensure any PDFs or document types you want to process are included per SUPPORTED_EXTENSIONS
-
Build and run via Docker: docker-compose build pipeline docker-compose run pipeline docker-compose build server
-
Generate mcp-config.json for your assistant setup (if using the provided helper):
For macOS/Linux
chmod +x setup-mcpServer-json.sh ./setup-mcpServer-json.sh
For Windows
setup-mcpServer-json.bat
-
Start the MCP server with Docker: docker-compose up -d
Note: The repository includes a two-stage setup where you first process data into a vector store, then build and run the MCP server that serves queries to an MCP-compatible assistant.
Additional notes
Tips and common issues:
- Ensure your data directory contains the files you want to index and that the vector store (e.g., chroma.sqlite3) is created in db/ after running the pipeline.
- If you encounter an invalid reference format error on Windows, make sure you have the correct Docker Compose configuration and have built the server image with docker-compose build server before running.
- The environment variables in .env govern both data processing and server behavior; adjust CHUNK_SIZE, CHUNK_OVERLAP, and EMBEDDING_MODEL to balance performance and accuracy for your documents.
- If you skip providing an OpenAI API key, the system will attempt to use free local embedding models for processing; the choice of models is configured via EMBEDDING_MODEL.
- The server supports a range of embedding models; verify which are available in your deployment and adjust EMBEDDING_MODEL accordingly.
- Ensure the server image name matches the one used in your docker run command (donphi/mcp-server used as placeholder here).
Environment variables to consider:
- OPENAI_API_KEY, ANTHROPIC_API_KEY: optional keys for embeddings or responses
- DATA_DIR, OUTPUT_DIR, DB_PATH: paths inside the container where data, outputs, and the vector store reside
- CONFIG_PATH: path to the server configuration file inside the container
Related MCP Servers
mcp-vegalite
MCP server from isaacwasserman/mcp-vegalite-server
github-chat
A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
pagerduty
PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.
futu-stock
mcp server for futuniuniu stock
mcp -boilerplate
Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP