codebase-RAG
A Retrieval-Augmented Generation (RAG) Model-Controller-Provider (MCP) server designed to assist AI agents and developers in understanding and navigating codebases.. It supports incremental indexing and multi-language parsing, enabling LLMs to understand and interact with code.
claude mcp add --transport stdio bluewings1211-codebase-rag python src/run_mcp.py
How to use
Codebase RAG MCP Server provides a retrieval-augmented tooling layer for navigating and analyzing codebases at function-level precision. It uses a multi-modal retrieval system layered on top of a Qdrant vector store and Ollama embeddings to enable natural-language queries over code, with modes that tailor retrieval to local (low-level keywords), global (high-level relationships), or hybrid strategies. The server exposes a suite of MCP tools for indexing local directories, analyzing project structure, automatically determining project types, and querying indexed codebases in natural language. You can rely on intelligent auto-configuration, performance monitoring, and robust error handling to keep interactions responsive and reliable, including a guaranteed 15-second response window for MCP tool operations.
To use the server, register it with your MCP-enabled environment (Cursor, Gemini CLI, Claude Code, or generic MCP clients) using the Python entrypoint specified in your virtual environment. Once registered, you can invoke tools for indexing a local project, querying functions, classes, and methods, and performing multi-modal retrieval to surface relevant code and documentation. The system supports embedding model selection via Ollama, chunked indexing with syntax-aware tooling, and automatic project analysis that respects .gitignore rules. This enables natural-language questions like “Show me all functions related to API authentication” or “Find where a class is instantiated and used across the repo,” with results scoped to the most relevant code structures.
Tools available include: indexing of local project directories with function-level granularity, automatic project analysis and file discovery, multi-modal retrieval strategies (local/global/hybrid/mix), embedding model management through Ollama, and performance monitoring with graceful degradation in case of resource constraints.
How to install
Prerequisites:
- Python 3.10+
- uv (Python package manager) installed via pip
- Docker (for Qdrant vector database, if using Docker-based deployment)
- Ollama (for local language and embedding models)
Installation steps:
-
Clone the repository: git clone <repository_url> cd codebase-rag-mcp-server
-
Create and activate a Python virtual environment using uv: uv sync
This creates a .venv directory in your project
-
Install and configure dependencies:
Dependencies are typically installed via the lockfile; uv sync handles this step
-
Start supporting services (examples):
- Start Qdrant (vector database) via Docker: docker run -p 6333:6333 -p 6334:6334 -v $(pwd)/qdrant_data:/qdrant/storage qdrant/qdrant
- Start Ollama and pull a default embedding model: ollama pull nomic-embed-text
-
Create and configure environment file: cp .env.example .env
Edit .env as needed for local setup
-
Run the MCP server using the MCP client-compatible entrypoint defined in the project (example): uv run python src/run_mcp.py
-
Register with MCP-enabled tools (examples):
- Cursor IDE / MCP extension: configure server with the Python path to the virtual environment and run script
- Claude Code: follow the provided registration commands in the README
- Gemini CLI: use the provided mcp add commands for the server
Prerequisites recap: ensure Python 3.10+, uv, Docker, and Ollama are installed and that supporting services (Qdrant and Ollama) are running before indexing or querying.
Additional notes
Tips and caveats:
- Ensure Qdrant is up and accessible at the configured host/port before indexing or querying.
- The MCP integration requires proper environment configuration and a valid virtual environment; use uv sync to create and manage the .venv.
- The server is designed for function-level code understanding; for large codebases, rely on the multi-modal retrieval modes to optimize results (local for precise, global for broader context).
- If you encounter latency or timeouts, verify system resources and adjust client-side timeouts; the system aims for 15-second responses but large projects may require tuning.
- When configuring Cursor or Gemini, ensure the cwd and paths to the Python executable and src/run_mcp.py are correct and accessible by the IDE/CLI.
- Regularly pull the latest embedding models via Ollama to improve embedding quality and search relevance.
Related MCP Servers
chunkhound
Local first codebase intelligence
VectorCode
A code repository indexing tool to supercharge your LLM experience.
haiku.rag
Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling
mcp-pinecone
Model Context Protocol server to allow for reading and writing from Pinecone. Rudimentary RAG
nextcloud
Nextcloud MCP Server
Archive-Agent
Find your files with natural language and ask questions.