Get the FREE Ultimate OpenClaw Setup Guide →

llm-search

Querying local documents, powered by LLM

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio snexus-llm-search python -m llm_search.api \
  --env HF_HOME="HF cache/home for HuggingFace models" \
  --env OPENAI_API_KEY="OpenAI API key (if using OpenAI models)" \
  --env PYTHONWARNINGS="ignore"

How to use

This MCP server exposes the advanced pyLLMSearch RAG system via a FastAPI-based API, making it accessible to MCP clients such as Cursor, Windsurf, or VSCode GH Copilot. The server orchestrates document parsing, embedding, hybrid search with dense and sparse methods, HyDE capabilities, chat history, and optional multi-querying, all configurable through a simple YAML-based configuration. Clients can query the RAG system to retrieve relevant passages, re-rank results, and generate context-aware answers against a local document collection or embedded corpora. To use it with MCP clients, point your client at the server’s MCP endpoint and leverage the provided tools for document retrieval, hybrid search, and answer generation. The system supports local embeddings, OpenAI-compatible models, and local HuggingFace/Ollama/LiteLLM backends depending on your configuration.

How to install

Prerequisites:

  • Python 3.8+ (recommended)
  • pip (Python package manager)
  • Optional: a virtual environment tool (venv, conda)

Setup steps:

  1. Clone the repository or navigate to the project directory: git clone https://github.com/snexus/llm-search.git cd llm-search

  2. Create and activate a virtual environment (optional but recommended):

    Python venv (Unix/macOS)

    python -m venv venv source venv/bin/activate

    Windows

    venv\Scripts\activate.bat

  3. Install the package requirements: pip install -r requirements.txt

    Or, install in editable mode if developing:

    pip install -e .

  4. Run the MCP server (the recommended approach uses the module entrypoint): python -m llm_search.api

  5. Verify the server is running by hitting the API endpoint (e.g., http://localhost:8000/docs for the FastAPI docs) or using your MCP client to connect to the configured endpoint.

Tip: Configure a YAML-based configuration file to customize parsers, embeddings, models, and hybrid search settings as described in the project documentation.

Additional notes

Tips and caveats:

  • HyDE can significantly alter results. Enable it only after reviewing the trade-offs for your domain.
  • Hybrid search relies on both dense embeddings and sparse methods like SPLADE; ensure your vector store (e.g., Chroma) is properly configured.
  • Large document collections may require incremental indexing and proper chunking settings to balance speed and accuracy.
  • If you run the server behind a firewall or on a non-default port, update the MCP client configuration accordingly.
  • Ensure you set your API keys (e.g., OPENAI_API_KEY, HuggingFace tokens) as environment variables or in a secure configuration store.
  • For debugging, refer to the FastAPI docs exposed at the server’s /docs and /redoc endpoints once the server is running.

Related MCP Servers

Sponsor this space

Reach thousands of developers