MCP

This MCP server lets AI assistants access and search your private documents, codebases, and latest tech info. It processes Markdown, text, and PDFs into a searchable database, extending AI knowledge beyond training data. Built with Docker, supports free and paid embeddings, and keeps AI updated with your data.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio donphi-mcp-server docker run -i donphi/mcp-server \
  --env DB_PATH="/db" \
  --env DATA_DIR="/data" \
  --env BATCH_SIZE="10" \
  --env CHUNK_SIZE="800" \
  --env OUTPUT_DIR="/output" \
  --env CONFIG_PATH="/config/server_config.json" \
  --env MAX_RESULTS="10" \
  --env CLAUDE_MODEL="claude-3-7-sonnet-20240307" \
  --env CHUNK_OVERLAP="120" \
  --env USE_ANTHROPIC="true" \
  --env OPENAI_API_KEY="your_openai_api_key_here (optional - can use local embeddings if omitted)" \
  --env EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2" \
  --env ANTHROPIC_API_KEY="your_anthropic_api_key_here (optional)" \
  --env SUPPORTED_EXTENSIONS=".md,.txt,.pdf,.docx,.doc"

How to use

This MCP server exposes your processed document content via the MCP interface so that AI assistants can query and retrieve information from your private data. It is designed to work with an MCP-compatible assistant and can point to a local vector database containing embeddings of your Markdown and text files. Use the provided tooling to build and run the pipeline that ingests your data, then spin up the server to serve search and retrieval results to an assistant. The server supports standard MCP queries to fetch relevant passages or summaries from your data sources, enabling up-to-date documentation lookup, private codebase understanding, and technical specification retrieval within conversational agents.

How to install

Prerequisites:

Docker Desktop (Windows/macOS) or Docker Engine (Linux)
Git installed
Access to the repository you will run (clone from GitHub)

Install steps:

Clone the repository: git clone https://github.com/donphi/mcp-server.git cd mcp-server
Create and configure environment: cp .env.example .env # edit with your settings

Edit .env to set OPENAI_API_KEY, ANTHROPIC_API_KEY, data/output/db paths, and server options
Prepare data:
- Place your Markdown (.md) and text (.txt) files in the data/ directory
- Ensure any PDFs or document types you want to process are included per SUPPORTED_EXTENSIONS
Build and run via Docker: docker-compose build pipeline docker-compose run pipeline docker-compose build server
Generate mcp-config.json for your assistant setup (if using the provided helper):

For macOS/Linux

chmod +x setup-mcpServer-json.sh ./setup-mcpServer-json.sh

For Windows

setup-mcpServer-json.bat
Start the MCP server with Docker: docker-compose up -d

Note: The repository includes a two-stage setup where you first process data into a vector store, then build and run the MCP server that serves queries to an MCP-compatible assistant.

Additional notes

Tips and common issues:

Ensure your data directory contains the files you want to index and that the vector store (e.g., chroma.sqlite3) is created in db/ after running the pipeline.
If you encounter an invalid reference format error on Windows, make sure you have the correct Docker Compose configuration and have built the server image with docker-compose build server before running.
The environment variables in .env govern both data processing and server behavior; adjust CHUNK_SIZE, CHUNK_OVERLAP, and EMBEDDING_MODEL to balance performance and accuracy for your documents.
If you skip providing an OpenAI API key, the system will attempt to use free local embedding models for processing; the choice of models is configured via EMBEDDING_MODEL.
The server supports a range of embedding models; verify which are available in your deployment and adjust EMBEDDING_MODEL accordingly.
Ensure the server image name matches the one used in your docker run command (donphi/mcp-server used as placeholder here).

Environment variables to consider:

OPENAI_API_KEY, ANTHROPIC_API_KEY: optional keys for embeddings or responses
DATA_DIR, OUTPUT_DIR, DB_PATH: paths inside the container where data, outputs, and the vector store reside
CONFIG_PATH: path to the server configuration file inside the container

Related MCP Servers

mcp-vegalite

MCP server from isaacwasserman/mcp-vegalite-server

github-chat

A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.

nautex

MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline

pagerduty

PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.

futu-stock

mcp server for futuniuniu stock

mcp -boilerplate

Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP

MCP