Get the FREE Ultimate OpenClaw Setup Guide →

mcp-rag

mcp-rag-server is a Model Context Protocol (MCP) server that enables Retrieval Augmented Generation (RAG) capabilities. It empowers Large Language Models (LLMs) to answer questions based on your document content by indexing and retrieving relevant information efficiently.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio kwanleefrmvi-mcp-rag-server npx -y mcp-rag-server \
  --env CHUNK_SIZE="500" \
  --env BASE_LLM_API="http://localhost:11434/v1" \
  --env EMBEDDING_MODEL="nomic-embed-text" \
  --env VECTOR_STORE_PATH="./vector_store"

How to use

The mcp-rag-server is a Retrieval Augmented Generation (RAG) MCP server. It indexes your documents, creates embeddings, and serves relevant context to MCP clients by exposing tools and resources over the MCP protocol. The server uses a local vector store (SQLite) and supports multiple embedding providers, with options to customize the chunk size and embedding model. Once running, you can index documents, query for relevant chunks, and retrieve document content through URIs like rag://documents and rag://query-document/{n}/{query}. Tools exposed by the server include embedding_documents(path), query_documents(query, k), remove_document(path), remove_all_documents(confirm), and list_documents(). The example configuration shows how to run the server via npx, including environment variables for API endpoints, embedding model, vector store path, and chunk size. To integrate with your MCP client, start the server and point the client at the rag MCP resource namespace to perform indexing, querying, and document retrieval.

How to install

Prerequisites:

  • Node.js (and npm) installed on your machine
  • Git (optional, for cloning the repository)

Installation steps:

  1. Global npm installation (recommended for quick start): npm install -g mcp-rag-server

  2. From source (if you want to build locally): git clone https://github.com/kwanLeeFrmVi/mcp-rag-server.git cd mcp-rag-server npm install npm run build npm start

  3. Run with npx (as shown in the integration example): npx -y mcp-rag-server

  4. If you prefer explicit environment setup, you can export variables before starting: export BASE_LLM_API=http://localhost:11434/v1 export EMBEDDING_MODEL=granite-embedding-278m-multilingual-Q6_K-1743674737397:latest export VECTOR_STORE_PATH=./vector_store export CHUNK_SIZE=500 npx -y mcp-rag-server

Additional notes

Tips and common considerations:

  • The DEFAULT integration uses BASE_LLM_API pointing to a local embedding service. Adjust BASE_LLM_API to your embedding provider if needed.
  • EMBEDDING_MODEL can be switched to models like nomic-embed-text or granite-based embeddings; Ollama is recommended for best performance, but ensure the model is available in your environment.
  • VECTOR_STORE_PATH controls where vectors are stored locally (SQLite). Move this to a persistent path if you restart the server.
  • CHUNK_SIZE defines how much text is included per chunk; larger values may improve context but require more memory during embedding.
  • When running via MCP clients, ensure the client is configured to use the rag namespace (rag://) for documents and queries.
  • If you run into indexing issues, verify that your embedding API is reachable and that the vector store directory is writable.
  • The npm package name is mcp-rag-server and the npm package page is linked in the README.

Related MCP Servers

Sponsor this space

Reach thousands of developers