Get the FREE Ultimate OpenClaw Setup Guide →

KnowledgeMCP

Model Context Protocol (MCP) server for local vector storage & semantic search (ChromaDB, OCR, async ingestion).

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio maxzrff-knowledgemcp python -m src.mcp.server \
  --env OCR_ENABLED="true/false (config.yaml control) - optional override" \
  --env EMBEDDING_MODEL="path or name of embedding model - optional"

How to use

KnowledgeMCP is a local, private MCP server that enables AI assistants and agents to perform semantic search over your documents. It supports multiple contexts to keep knowledge domains isolated, and can handle PDFs, DOCX, PPTX, XLSX, HTML, and image formats with smart OCR when needed. The server exposes MCP tools for document management and search, including knowledge-add (index documents), knowledge-search (semantic queries), and knowledge-show (list documents). You can organize documents into contexts, search within a specific context for fast, relevant results, or search across all contexts. The system stores embeddings in a local vector store (ChromaDB) and runs entirely on your machine, ensuring data never leaves your environment.

How to install

Prerequisites:

  • Python 3.11+ (or Python 3.12)
  • pip (comes with Python)
  • Optional: Tesseract OCR if you plan to OCR scanned documents

Step-by-step installation:

  1. Clone the repository (replace with your actual repo URL): git clone https://github.com/yourusername/KnowledgeMCP.git cd KnowledgeMCP

  2. Create a virtual environment and activate it: python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

  3. Install dependencies: pip install -r requirements.txt

  4. Download the embedding model (first run, ~91MB). This can be done with a Python snippet if your setup requires it: python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"

  5. Run the MCP server directly (development) to verify: python -m src.mcp.server

  6. Optional: start server with management scripts if provided in the repo (e.g., server.sh): ./server.sh start

  7. If you plan to OCR documents, install Tesseract and related tools:

    Ubuntu/Debian

    sudo apt-get install tesseract-ocr poppler-utils

    macOS

    brew install tesseract poppler

    Windows (via Chocolatey)

    choco install tesseract poppler

  8. Configure OCR and behavior in config.yaml as described in the docs.

Additional notes

Notes and tips:

  • The MCP server runs locally and stores data in a local ChromaDB vector store. Ensure file permissions allow read/write for the storage directory.
  • Use multiple contexts to organize your knowledge base; you can assign documents to one or more contexts and search within a specific context for faster results.
  • You can enable or disable OCR globally via the config.yaml OCR section; per-document OCR can also be forced via API or MCP tool flags.
  • Processing metadata indicates whether text extraction or OCR was used, and you can inspect OCR confidence if available.
  • If you encounter performance issues with very large document collections, consider indexing in async mode and then performing searches in a focused context to reduce latency.
  • The server exposes MCP tools such as knowledge-add, knowledge-search, and knowledge-show; consult the MCP client docs for exact syntax and options.

Related MCP Servers

Sponsor this space

Reach thousands of developers