llm-search

Querying local documents, powered by LLM

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio snexus-llm-search python -m llm_search.api \
  --env HF_HOME="HF cache/home for HuggingFace models" \
  --env OPENAI_API_KEY="OpenAI API key (if using OpenAI models)" \
  --env PYTHONWARNINGS="ignore"

How to use

This MCP server exposes the advanced pyLLMSearch RAG system via a FastAPI-based API, making it accessible to MCP clients such as Cursor, Windsurf, or VSCode GH Copilot. The server orchestrates document parsing, embedding, hybrid search with dense and sparse methods, HyDE capabilities, chat history, and optional multi-querying, all configurable through a simple YAML-based configuration. Clients can query the RAG system to retrieve relevant passages, re-rank results, and generate context-aware answers against a local document collection or embedded corpora. To use it with MCP clients, point your client at the server’s MCP endpoint and leverage the provided tools for document retrieval, hybrid search, and answer generation. The system supports local embeddings, OpenAI-compatible models, and local HuggingFace/Ollama/LiteLLM backends depending on your configuration.

How to install

Prerequisites:

Python 3.8+ (recommended)
pip (Python package manager)
Optional: a virtual environment tool (venv, conda)

Setup steps:

Clone the repository or navigate to the project directory: git clone https://github.com/snexus/llm-search.git cd llm-search
Create and activate a virtual environment (optional but recommended):

Python venv (Unix/macOS)

python -m venv venv source venv/bin/activate

Windows

venv\Scripts\activate.bat
Install the package requirements: pip install -r requirements.txt

Or, install in editable mode if developing:

pip install -e .
Run the MCP server (the recommended approach uses the module entrypoint): python -m llm_search.api
Verify the server is running by hitting the API endpoint (e.g., http://localhost:8000/docs for the FastAPI docs) or using your MCP client to connect to the configured endpoint.

Tip: Configure a YAML-based configuration file to customize parsers, embeddings, models, and hybrid search settings as described in the project documentation.

Additional notes

Tips and caveats:

HyDE can significantly alter results. Enable it only after reviewing the trade-offs for your domain.
Hybrid search relies on both dense embeddings and sparse methods like SPLADE; ensure your vector store (e.g., Chroma) is properly configured.
Large document collections may require incremental indexing and proper chunking settings to balance speed and accuracy.
If you run the server behind a firewall or on a non-default port, update the MCP client configuration accordingly.
Ensure you set your API keys (e.g., OPENAI_API_KEY, HuggingFace tokens) as environment variables or in a secure configuration store.
For debugging, refer to the FastAPI docs exposed at the server’s /docs and /redoc endpoints once the server is running.

Related MCP Servers

FastGPT

27.2k

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.

AstrBot

18.5k

Agentic IM Chatbot infrastructure that integrates lots of IM platforms, LLMs, plugins and AI feature, and can be your openclaw alternative. ✨

agentscope

16.7k

Build and run agents you can see, understand and trust.

note-gen

11.0k

A cross-platform Markdown AI note-taking software.

learn-ai-engineering

4.2k

Learn AI and LLMs from scratch using free resources

SearChat

1.0k

Search + Chat = SearChat(AI Chat with Search), Support OpenAI/Anthropic/VertexAI/Gemini, DeepResearch, SearXNG, Docker. AI对话式搜索引擎，支持DeepResearch, 支持OpenAI/Anthropic/VertexAI/Gemini接口、聚合搜索引擎SearXNG，支持Docker一键部署。