llm-search
Querying local documents, powered by LLM
claude mcp add --transport stdio snexus-llm-search python -m llm_search.api \ --env HF_HOME="HF cache/home for HuggingFace models" \ --env OPENAI_API_KEY="OpenAI API key (if using OpenAI models)" \ --env PYTHONWARNINGS="ignore"
How to use
This MCP server exposes the advanced pyLLMSearch RAG system via a FastAPI-based API, making it accessible to MCP clients such as Cursor, Windsurf, or VSCode GH Copilot. The server orchestrates document parsing, embedding, hybrid search with dense and sparse methods, HyDE capabilities, chat history, and optional multi-querying, all configurable through a simple YAML-based configuration. Clients can query the RAG system to retrieve relevant passages, re-rank results, and generate context-aware answers against a local document collection or embedded corpora. To use it with MCP clients, point your client at the server’s MCP endpoint and leverage the provided tools for document retrieval, hybrid search, and answer generation. The system supports local embeddings, OpenAI-compatible models, and local HuggingFace/Ollama/LiteLLM backends depending on your configuration.
How to install
Prerequisites:
- Python 3.8+ (recommended)
- pip (Python package manager)
- Optional: a virtual environment tool (venv, conda)
Setup steps:
-
Clone the repository or navigate to the project directory: git clone https://github.com/snexus/llm-search.git cd llm-search
-
Create and activate a virtual environment (optional but recommended):
Python venv (Unix/macOS)
python -m venv venv source venv/bin/activate
Windows
venv\Scripts\activate.bat
-
Install the package requirements: pip install -r requirements.txt
Or, install in editable mode if developing:
pip install -e .
-
Run the MCP server (the recommended approach uses the module entrypoint): python -m llm_search.api
-
Verify the server is running by hitting the API endpoint (e.g., http://localhost:8000/docs for the FastAPI docs) or using your MCP client to connect to the configured endpoint.
Tip: Configure a YAML-based configuration file to customize parsers, embeddings, models, and hybrid search settings as described in the project documentation.
Additional notes
Tips and caveats:
- HyDE can significantly alter results. Enable it only after reviewing the trade-offs for your domain.
- Hybrid search relies on both dense embeddings and sparse methods like SPLADE; ensure your vector store (e.g., Chroma) is properly configured.
- Large document collections may require incremental indexing and proper chunking settings to balance speed and accuracy.
- If you run the server behind a firewall or on a non-default port, update the MCP client configuration accordingly.
- Ensure you set your API keys (e.g., OPENAI_API_KEY, HuggingFace tokens) as environment variables or in a secure configuration store.
- For debugging, refer to the FastAPI docs exposed at the server’s /docs and /redoc endpoints once the server is running.
Related MCP Servers
FastGPT
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
AstrBot
Agentic IM Chatbot infrastructure that integrates lots of IM platforms, LLMs, plugins and AI feature, and can be your openclaw alternative. ✨
agentscope
Build and run agents you can see, understand and trust.
note-gen
A cross-platform Markdown AI note-taking software.
learn-ai-engineering
Learn AI and LLMs from scratch using free resources
SearChat
Search + Chat = SearChat(AI Chat with Search), Support OpenAI/Anthropic/VertexAI/Gemini, DeepResearch, SearXNG, Docker. AI对话式搜索引擎,支持DeepResearch, 支持OpenAI/Anthropic/VertexAI/Gemini接口、聚合搜索引擎SearXNG,支持Docker一键部署。