locallama

An MCP Server that works with Roo Code/Cline.Bot/Claude Desktop to optimize costs by intelligently routing coding tasks between local LLMs free APIs and paid APIs.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio heratiki-locallama-mcp node path/to/server.js

How to use

LocaLLama MCP Server acts as a cost-aware routing layer for coding tasks. It dynamically decides whether to execute tasks on local LLMs (e.g., LM Studio, Ollama) or to forward them to paid APIs, with the aim of reducing token usage and costs while maintaining acceptable quality. The server includes a cost and token monitoring module, a decision engine with configurable offload thresholds, and benchmarking tools to compare local versus API performance. Available tools include route_task for routing tasks, preemptive_route_task for fast but potentially less accurate routing, get_cost_estimate to gauge task costs, and benchmark_task/benchmark_tasks to evaluate performance across models. If you have Retriv installed, you can also use retriv_search for semantic code search and several OpenRouter-related utilities for accessing free models and benchmarking strategies. The server also supports a lock mechanism to prevent multiple concurrent instances and offers status and memory/openrouter endpoints for monitoring and diagnostics.

To use these capabilities, first ensure your local environment has the required local LLM endpoints (e.g., LM Studio, Ollama) or access to OpenRouter APIs. Then configure the server to point to those endpoints and start the MCP server. You can route a coding task by calling route_task with the task description and metadata, or run preemptive_route_task for a faster, if less precise, routing decision. For long-running or batch tasks, you can use benchmark_tasks to assess how often local models outperform paid APIs under your workload, and adjust thresholds accordingly.

In practice, this means you can: (1) route tasks to local models to save costs on simple or routine code work, (2) offload to paid APIs for complex tasks that require higher accuracy or specific capabilities, and (3) continuously monitor performance and costs to refine your routing decisions over time.

How to install

Prerequisites:

Node.js and npm installed
Access to a suitable local LLM endpoint (e.g., LM Studio, Ollama) if you plan to route tasks locally
Optional: Retriv installed for code search features

Clone the repository

git clone https://github.com/yourusername/locallama-mcp.git
cd locallama-mcp

Install dependencies

npm install

Build the project (if applicable)

npm run build

Install Python retriv dependencies (optional but required for code search features)

pip install retriv>=0.3.1 numpy>=1.22.0 scikit-learn>=1.0.2 scipy>=1.8.0

(Optional) Configure Python environment for Retriv usage

# Create and activate a virtual environment
# Linux/macOS
python3 -m venv venv
source venv/bin/activate

# Windows
python -m venv venv
venv\Scripts\activate

Start the MCP server

# If using the node-based server layout, start the server via node
node path/to/server.js

Validate installation

Check status endpoint to ensure the server is running
Verify the available tools listed under the server, such as route_task and get_cost_estimate

Note: If your setup uses retriv, ensure PYTHON_PATH and PYTHON_DETECT_VENV environment variables are configured in your .env file as documented in the README.

Additional notes

Tips and caveats:

The server includes a Lock Mechanism to prevent multiple instances; if you encounter a stale lock, ensure no other process is holding the lock before restarting.
When using Retriv for code search, you can enable automatic installation of dependencies by setting retriv_init install_dependencies to true.
Configure OpenRouter integration only if you have an API key; otherwise, you can still benchmark local models using the provided tools.
For cost estimations, provide accurate context_length, expected_output_length, and model identifiers to get meaningful results.
If you encounter issues with local LLM endpoints, verify network access and that the endpoints expose the expected REST-like interfaces compatible with the MCP server.
The configuration supports adding environment variables for local endpoints (e.g., LOCAL_LM_ENDPOINT, OPENROUTER_API_KEY). Add these to the env block in mcp_config as needed.

Related MCP Servers

mcp -azure-devops

339

An MCP server for Azure DevOps

BifrostMCP

205

VSCode Extension with an MCP server that exposes semantic tools like Find Usages and Rename to LLMs

furi

CLI & API for MCP management

smartbear

SmartBear's official MCP Server

MCP-Github-Mapper

MCP GitHub Mapper is a MCP tool that will map any repository remotely and import the map directly into your code editor.

teamxray

Human discovery through code analysis - VS Code extension that reveals team expertise.