mnemos

Self hosted MCP knowledge server. Turn docs into searchable context with multi collection isolation, deterministic ingestion, and local vector search via Ollama + pgvector. 100% private, zero vendor lock in.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio tanush1912-mnemos-mcp python cli/mnemos.py server \
  --env CHUNK_SIZE="300" \
  --env DATABASE_URL="postgresql+asyncpg://<user>:<password>@<host>:<port>/<db_name>" \
  --env CHUNK_OVERLAP="40" \
  --env EMBEDDING_MODEL="nomic-embed-text" \
  --env OLLAMA_BASE_URL="http://127.0.0.1:11434" \
  --env EMBEDDING_PROVIDER="ollama"

How to use

Mnemos exposes an MCP-compatible API that lets AI agents query your local knowledge base without exporting data. The server runs locally (via Python) and provides standard MCP endpoints such as GET /mcp/tools to list available tools and POST /mcp/call to execute a tool. The available tools include: search_context for retrieving relevant context from your documents, list_documents to enumerate all stored documents, and get_document_info for detailed metadata about a specific document. To connect a client (like Claude Desktop or another MCP consumer), point it at http://localhost:8000/mcp/call and supply the appropriate tool name and arguments. The system is designed to be stateless from the client’s perspective; all persistence happens on the server, backed by PostgreSQL with pgvector and local Ollama embeddings.

How to install

Prerequisites: Docker + Docker Compose, Python 3.11 or newer, and Ollama installed for local embeddings.

Step 1: Install dependencies and prerequisites

# Install Ollama and pull the embedding model (if not already installed)
brew install ollama
ollama serve
ollama pull nomic-embed-text

# Install Python dependencies
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Step 2: Start the database (Postgres with pgvector)

cd docker
docker-compose up -d

Step 3: Run the Mnemos server (MCP API)

# Option A: Start the API server
python cli/mnemos.py server

# Option B: Run API directly for development
uvicorn src.main:app --reload

Step 4: Run MCP tooling (optional)

# Example: list MCP tools
curl -X GET http://localhost:8000/mcp/tools

Prerequisites note: Ensure the environment variables (DATABASE_URL, EMBEDDING_PROVIDER, EMBEDDING_MODEL, OLLAMA_BASE_URL, CHUNK_SIZE, CHUNK_OVERLAP) are configured (see environment section) before starting the server.

Additional notes

Tips and notes:

Mnemos is designed to run 100% locally; ensure Postgres is accessible from the server process and that Ollama is running for embeddings.
The MCP integration is stateless from the client; you can deploy this behind a secure tunnel or on a private network.
If you encounter connection issues, verify that PostgreSQL is up, the Ollama service is reachable at the configured URL, and that the environment variables are correctly set.
The environment variables set defaults for embedding provider and model; adjust them if you plan to switch to a cloud embedding provider or a different local model.
For development, you can run uvicorn directly to test REST endpoints before enabling MCP tooling.

Related MCP Servers

VectorCode

811

A code repository indexing tool to supercharge your LLM experience.

persistent-ai-memory

203

A persistent local memory for AI, LLMs, or Copilot in VS Code.

Archive-Agent

Find your files with natural language and ask questions.

code-memory

MCP server with local vector search for your codebase. Smart indexing, semantic search, Git history — all offline.

mcp-raganything

API/MCP wrapper for RagAnything

srclight

Deep code indexing MCP server for AI agents. 25 tools: hybrid FTS5 + embedding search, call graphs, git blame/hotspots, build system analysis. Multi-repo workspaces, GPU-accelerated semantic search, 10 languages via tree-sitter. Fully local, zero cloud dependencies.