haiku.rag
Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling
claude mcp add --transport stdio ggozad-haiku.rag haiku-rag serve --mcp --stdio
How to use
Haiku RAG exposes its document management, search, QA, and research capabilities as MCP tools so you can orchestrate them from an AI assistant or other automation. The server integrates hybrid search (vector + full-text), QA with citations, RLM code execution, and multi-agent workflows, all backed by LanceDB storage and Docling document structures. With the MCP mode enabled, you can call these capabilities through the AI assistant as discrete tools (e.g., index documents, search chunks, perform QA with citations, or run iterative research workflows) and receive structured results that include provenance such as page numbers and section headings. The CLI and Python API allow you to index sources, perform searches, run QA queries, and manage conversations or research planning within your environment. To use it from an assistant, run the MCP-enabled server and point your assistant’s MCP configuration at the haiku-rag tool, which will surface functions like add-src, search, ask, research, rlm, chat, and serve with memory and monitoring features.
How to install
Prerequisites:
- Python 3.12 or newer
- Access to install Python packages (pip)
- Optional: an embedding provider (e.g., Ollama, OpenAI) and a storage backend (LanceDB) configured as described in the Haiku RAG docs
- Create a virtual environment (recommended):
python3 -m venv env
source env/bin/activate
- Install the full Haiku RAG package (recommended):
pip install haiku.rag
This includes document processing, all embedding providers, and rerankers. If you want a minimal footprint, install the slim package instead:
pip install haiku.rag-slim
- Verify installation and run the MCP-enabled server:
haiku-rag serve --mcp --stdio
-
(Optional) If using uv/alternative runtimes, follow the uv installation guidance and ensure your environment satisfies the provider dependencies as described in the Haiku RAG installation docs.
-
See the MQTT/MCP integration docs for configuring your AI assistant to talk to the MCP server (the example in the README shows haiku-rag as the tool name and the command/args to expose).
Additional notes
Tips and notes:
- The MCP server exposes tools such as add-src (index documents), search (hybrid search), ask (QA with citations), research (iterative planning/search), rlm (code execution), and chat (interactive memory-enabled conversations).
- Environment variables can be used to configure embedding providers, LanceDB paths, and cloud storage backends; consult the Haiku RAG docs for provider-specific settings.
- When running in production, consider mount points for directories to enable file monitoring (--monitor) and ensure proper permissions for read/write access.
- If you encounter performance or memory issues, adjust the index batch sizes, routing for rerankers, and the LanceDB storage configuration as described in the configuration documentation.
- The MCP integration is intended to be consumed by AI assistants (e.g., Claude Desktop); ensure the assistant is configured to use the haiku-rag MCP server and to handle citations from QA results.
Related MCP Servers
jupyter
🪐 🔧 Model Context Protocol (MCP) Server for Jupyter.
mcp-pinecone
Model Context Protocol server to allow for reading and writing from Pinecone. Rudimentary RAG
pluggedin-app
The Crossroads for AI Data Exchanges. A unified, self-hostable web interface for discovering, configuring, and managing Model Context Protocol (MCP) servers—bringing together AI tools, workspaces, prompts, and logs from multiple MCP sources (Claude, Cursor, etc.) under one roof.
beemcp
BeeMCP: an unofficial Model Context Protocol (MCP) server that connects your Bee wearable lifelogger to AI via the Model Context Protocol
RiMCP_hybrid
Rimworld Coding RAG MCP server
BinAssistMCP
Binary Ninja plugin to provide MCP functionality.