mnemos
Self hosted MCP knowledge server. Turn docs into searchable context with multi collection isolation, deterministic ingestion, and local vector search via Ollama + pgvector. 100% private, zero vendor lock in.
claude mcp add --transport stdio tanush1912-mnemos-mcp python cli/mnemos.py server \ --env CHUNK_SIZE="300" \ --env DATABASE_URL="postgresql+asyncpg://<user>:<password>@<host>:<port>/<db_name>" \ --env CHUNK_OVERLAP="40" \ --env EMBEDDING_MODEL="nomic-embed-text" \ --env OLLAMA_BASE_URL="http://127.0.0.1:11434" \ --env EMBEDDING_PROVIDER="ollama"
How to use
Mnemos exposes an MCP-compatible API that lets AI agents query your local knowledge base without exporting data. The server runs locally (via Python) and provides standard MCP endpoints such as GET /mcp/tools to list available tools and POST /mcp/call to execute a tool. The available tools include: search_context for retrieving relevant context from your documents, list_documents to enumerate all stored documents, and get_document_info for detailed metadata about a specific document. To connect a client (like Claude Desktop or another MCP consumer), point it at http://localhost:8000/mcp/call and supply the appropriate tool name and arguments. The system is designed to be stateless from the client’s perspective; all persistence happens on the server, backed by PostgreSQL with pgvector and local Ollama embeddings.
How to install
Prerequisites: Docker + Docker Compose, Python 3.11 or newer, and Ollama installed for local embeddings.
Step 1: Install dependencies and prerequisites
# Install Ollama and pull the embedding model (if not already installed)
brew install ollama
ollama serve
ollama pull nomic-embed-text
# Install Python dependencies
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Step 2: Start the database (Postgres with pgvector)
cd docker
docker-compose up -d
Step 3: Run the Mnemos server (MCP API)
# Option A: Start the API server
python cli/mnemos.py server
# Option B: Run API directly for development
uvicorn src.main:app --reload
Step 4: Run MCP tooling (optional)
# Example: list MCP tools
curl -X GET http://localhost:8000/mcp/tools
Prerequisites note: Ensure the environment variables (DATABASE_URL, EMBEDDING_PROVIDER, EMBEDDING_MODEL, OLLAMA_BASE_URL, CHUNK_SIZE, CHUNK_OVERLAP) are configured (see environment section) before starting the server.
Additional notes
Tips and notes:
- Mnemos is designed to run 100% locally; ensure Postgres is accessible from the server process and that Ollama is running for embeddings.
- The MCP integration is stateless from the client; you can deploy this behind a secure tunnel or on a private network.
- If you encounter connection issues, verify that PostgreSQL is up, the Ollama service is reachable at the configured URL, and that the environment variables are correctly set.
- The environment variables set defaults for embedding provider and model; adjust them if you plan to switch to a cloud embedding provider or a different local model.
- For development, you can run uvicorn directly to test REST endpoints before enabling MCP tooling.
Related MCP Servers
VectorCode
A code repository indexing tool to supercharge your LLM experience.
persistent-ai-memory
A persistent local memory for AI, LLMs, or Copilot in VS Code.
Archive-Agent
Find your files with natural language and ask questions.
code-memory
MCP server with local vector search for your codebase. Smart indexing, semantic search, Git history — all offline.
mcp-raganything
API/MCP wrapper for RagAnything
srclight
Deep code indexing MCP server for AI agents. 25 tools: hybrid FTS5 + embedding search, call graphs, git blame/hotspots, build system analysis. Multi-repo workspaces, GPU-accelerated semantic search, 10 languages via tree-sitter. Fully local, zero cloud dependencies.