yzma-duckdb-rag

A Retrieval-Augmented Generation (RAG) system implemented in Go

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio innomon-yzma-duckdb-rag ./ydrag -model ./models/nomic-embed-text-v1.5.Q8_0.gguf serve

How to use

YDRAG (Y ZMA DuckDB RAG) is a retrieval-augmented generation MCP server written in Go. It combines local embedding generation using YZMA with a DuckDB-backed vector store to enable semantic search over ingested documents. The server exposes a small set of tools to manage and query your knowledge base: add_document to ingest new documents, query_documents to retrieve relevant passages, list_documents to view stored items, and delete_document to remove entries. It supports multiple MCP transport options for integration with AI assistants, including stdio, SSE, and streamable HTTP. Configure the transport via environment variables or a YAML config file, and run the server in serve mode to expose its endpoints for an MCP client such as Claude or Amp. The system is designed to work with GGUF embedding models and can extract text from PDFs during ingestion, store embeddings persistently, and perform fast similarity searches using DuckDB’s array_cosine_similarity.

How to install

Prerequisites:

Go 1.24+ with CGO enabled (CGo) and a C compiler (gcc/clang)
llama.cpp library built and accessible (set YZMA_LIB to the library path)
A GGUF embedding model (e.g., embeddinggemma, nomic-embed-text, bge-small)

Install and build:

# Install Go dependencies and build the binary
export CGO_ENABLED=1
go env CGO_ENABLED

# Build the YDRAG binary
CGO_ENABLED=1 go build -o ydrag .

Run the server (example):

# Default model path; adapt as needed
./ydrag -model ./models/nomic-embed-text-v1.5.Q8_0.gguf serve

Optional: verify CGo is enabled and tests can run:

go test ./... -v

Configuration can also be supplied via a config.yaml file or environment variables as described in the README.

Additional notes

Tips and notes:

Set YZMA_LIB to the full path of your llama.cpp dynamic library (e.g., /path/to/libllama.so or libllama.dylib) before starting the server.
DuckDB is used as the persistent vector store; ensure the file path in YDRAG_DB_PATH (or -db) is writable.
To expose the MCP server to an AI assistant, choose a transport (stdio for local clients, SSE or streamable-http for network access) and configure the corresponding environment variable (YDRAG_TRANSPORT and YDRAG_SERVER_PORT if needed).
Ingested documents can be PDFs or text; use the add_document tool to index content and enable fast semantic search.
If embedding model loading fails, verify that the model path is correct and that the embedding model is compatible with the chosen GGUF format.
The configuration supports YAML, environment variables, and CLI flags with a priority order: flags > environment > config.yaml > defaults.