codebase-RAG

A Retrieval-Augmented Generation (RAG) Model-Controller-Provider (MCP) server designed to assist AI agents and developers in understanding and navigating codebases.. It supports incremental indexing and multi-language parsing, enabling LLMs to understand and interact with code.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio bluewings1211-codebase-rag python src/run_mcp.py

How to use

Codebase RAG MCP Server provides a retrieval-augmented tooling layer for navigating and analyzing codebases at function-level precision. It uses a multi-modal retrieval system layered on top of a Qdrant vector store and Ollama embeddings to enable natural-language queries over code, with modes that tailor retrieval to local (low-level keywords), global (high-level relationships), or hybrid strategies. The server exposes a suite of MCP tools for indexing local directories, analyzing project structure, automatically determining project types, and querying indexed codebases in natural language. You can rely on intelligent auto-configuration, performance monitoring, and robust error handling to keep interactions responsive and reliable, including a guaranteed 15-second response window for MCP tool operations.

To use the server, register it with your MCP-enabled environment (Cursor, Gemini CLI, Claude Code, or generic MCP clients) using the Python entrypoint specified in your virtual environment. Once registered, you can invoke tools for indexing a local project, querying functions, classes, and methods, and performing multi-modal retrieval to surface relevant code and documentation. The system supports embedding model selection via Ollama, chunked indexing with syntax-aware tooling, and automatic project analysis that respects .gitignore rules. This enables natural-language questions like “Show me all functions related to API authentication” or “Find where a class is instantiated and used across the repo,” with results scoped to the most relevant code structures.

Tools available include: indexing of local project directories with function-level granularity, automatic project analysis and file discovery, multi-modal retrieval strategies (local/global/hybrid/mix), embedding model management through Ollama, and performance monitoring with graceful degradation in case of resource constraints.

How to install

Prerequisites:

Python 3.10+
uv (Python package manager) installed via pip
Docker (for Qdrant vector database, if using Docker-based deployment)
Ollama (for local language and embedding models)

Installation steps:

Clone the repository: git clone <repository_url> cd codebase-rag-mcp-server
Create and activate a Python virtual environment using uv: uv sync

This creates a .venv directory in your project
Install and configure dependencies:

Dependencies are typically installed via the lockfile; uv sync handles this step
Start supporting services (examples):
- Start Qdrant (vector database) via Docker: docker run -p 6333:6333 -p 6334:6334 -v $(pwd)/qdrant_data:/qdrant/storage qdrant/qdrant
- Start Ollama and pull a default embedding model: ollama pull nomic-embed-text
Create and configure environment file: cp .env.example .env

Edit .env as needed for local setup
Run the MCP server using the MCP client-compatible entrypoint defined in the project (example): uv run python src/run_mcp.py
Register with MCP-enabled tools (examples):
- Cursor IDE / MCP extension: configure server with the Python path to the virtual environment and run script
- Claude Code: follow the provided registration commands in the README
- Gemini CLI: use the provided mcp add commands for the server

Prerequisites recap: ensure Python 3.10+, uv, Docker, and Ollama are installed and that supporting services (Qdrant and Ollama) are running before indexing or querying.

Additional notes

Tips and caveats:

Ensure Qdrant is up and accessible at the configured host/port before indexing or querying.
The MCP integration requires proper environment configuration and a valid virtual environment; use uv sync to create and manage the .venv.
The server is designed for function-level code understanding; for large codebases, rely on the multi-modal retrieval modes to optimize results (local for precise, global for broader context).
If you encounter latency or timeouts, verify system resources and adjust client-side timeouts; the system aims for 15-second responses but large projects may require tuning.
When configuring Cursor or Gemini, ensure the cwd and paths to the Python executable and src/run_mcp.py are correct and accessible by the IDE/CLI.
Regularly pull the latest embedding models via Ollama to improve embedding quality and search relevance.