ToolRAG
Unlimited LLM tools, zero context penalties — ToolRAG serves exactly the LLM tools your user-query demands.
claude mcp add --transport stdio antl3x-toolrag npx -y @antl3x/toolrag
How to use
ToolRAG is an MCP-enabled tool discovery and execution layer designed to manage an unlimited number of tool definitions for large language models. It uses semantic search over embedded tool descriptions to select the most relevant tools for a given user query, reducing context window usage and token costs while preserving performance. Tool definitions are registered to MCP servers and exposed in a way that LLMs can call them as standard OpenAI function calls. With ToolRAG, you can build multi-tool AI assistants that can query, filter, and execute actions across a large catalog of tools without overwhelming the model with every available function.
To use ToolRAG, start the server and configure your MCP servers (tool sources) in ToolRAG's initialization. Once running, you can initialize ToolRAG in your client and pass a list of MCP server URLs. When a user query arrives, ToolRAG retrieves the most relevant tools, constructs a tool-list for the LLM to consider, and then executes the selected tool calls against the appropriate MCP servers. This enables seamless tool orchestration where the model only interacts with the subset of tools that are contextually relevant to the current task.
How to install
Prerequisites:
- Node.js (LTS) and npm or yarn installed on your system.
- Access to the internet to install npm packages.
Installation steps:
-
Install Node.js if not already installed. Visit https://nodejs.org/ and install the LTS version for your platform.
-
Install ToolRAG package (as a local dev dependency or run directly via npx):
- Using npm: npm install @antl3x/toolrag
- Alternatively, run directly with npx (no installation required): npx -y @antl3x/toolrag
- Initialize ToolRAG in your project code according to your environment (Node.js example shown in the repository's quick start):
# If installed locally
npm install @antl3x/toolrag
import { ToolRAG } from "@antl3x/toolrag";
import OpenAI from "openai";
const toolRag = await ToolRAG.init({
mcpServers: [
"https://mcp.example.com/token/tool-a",
"https://mcp.example.com/token/tool-b",
],
});
// Use toolRag as described in the README
- Configure and connect to your MCP servers as needed by your deployment. Ensure that the MCP endpoints are reachable from the environment running ToolRAG.
Additional notes
Tips:
- ToolRAG relies on vector embeddings for semantic retrieval. Configure embedding providers (e.g., OpenAI or Google) and update your tool metadata accordingly.
- Ensure your MCP endpoints expose tool definitions in the expected OpenAI function definition format for smooth execution.
- Manage relevance thresholds and persistence settings to balance recall vs. precision in tool selection.
- Monitor token usage and latency: ToolRAG’s strength is reducing the number of tools considered by the LLM; fine-tuning thresholds helps optimize performance for your workload.
- If you run into connectivity issues with MCP servers, verify network access, authentication tokens, and CORS or API gateway settings as applicable.
Related MCP Servers
osaurus
AI edge infrastructure for macOS. Run local or cloud models, share tools across apps via MCP, and power AI workflows with a native, always-on runtime.
mcp-router
A Unified MCP Server Management App (MCP Manager).
Matryoshka
MCP server for token-efficient large document analysis via the use of REPL state
mcp-llm
An MCP server that provides LLMs access to other LLMs
kanban
MCP Kanban is a specialized middleware designed to facilitate interaction between Large Language Models (LLMs) and Planka, a Kanban board application. It serves as an intermediary layer that provides LLMs with a simplified and enhanced API to interact with Planka's task management system.
ContextPods
Model Context Protocol management suite/factory. An MCP that can generate and manage other local MCPs in multiple languages. Uses the official SDKs for code gen.