mcp-local-llm
MCP server for delegating mechanical tasks to local LLMs via Ollama. Claude does the thinking, your local model does the grunt work.
claude mcp add --transport stdio aplaceforallmystuff-mcp-local-llm node /path/to/mcp-local-llm/dist/index.js \ --env LOCAL_LLM_MODEL="qwen2.5-coder:7b" \ --env LOCAL_LLM_BASE_URL="http://localhost:11434/v1" \ --env LOCAL_LLM_MAX_TOKENS="2048" \ --env LOCAL_LLM_TEMPERATURE="0.7"
How to use
mcp-local-llm provides a local, cost-optimized layer for delegating mechanical or bulk text tasks to a local large language model backend. Claude Code remains the decision-maker for what to delegate, while the local model handles high-volume work such as summarization, classification, extraction, and drafting. The server exposes tools that you can call from Claude Code (or any MCP-compatible client): local_summarize for bulk text summarization, local_draft for initial content generation, local_classify for tagging and sorting, local_extract for structured data extraction, local_transform for formatting and style changes, local_complete for raw completions, and local_status to verify connectivity and available models. This separation helps reduce API usage costs while keeping Claude in charge of quality control and decision-making.
How to install
Prerequisites:
- Ollama installed and running (local LLM backend).
- Node.js 18+ installed.
- Claude Code (or any MCP-compatible client).
Installation steps:
- Install Ollama and pull a model (example):
- brew install ollama
- ollama serve
- ollama pull qwen2.5-coder:7b
- Clone the repository and install dependencies:
- git clone https://github.com/aplaceforallmystuff/mcp-local-llm.git
- cd mcp-local-llm
- npm install
- Build the project:
- npm run build
- Run the MCP server (example):
- node dist/index.js
- Add the MCP server to Claude Code or your MCP client:
- In Claude Code: claude mcp add local-llm -s user -- node /path/to/mcp-local-llm/dist/index.js
- Or edit ~/.claude.json to include: { "mcpServers": { "local-llm": { "command": "node", "args": ["/path/to/mcp-local-llm/dist/index.js"] } } }
- Verify connectivity:
- Use the local_status tool in Claude Code to confirm Ollama connection and available models.
Additional notes
Environment variables are optional and default values work with a standard Ollama setup. If you prefer a Docker-based OpenAI-compatible backend, you can use the Docker Model Runner and point LOCAL_LLM_BASE_URL to the TCP endpoint. Common issues include ensuring Ollama is running (ollama list), matching the model name with LOCAL_LLM_MODEL, and adjusting LOCAL_LLM_MAX_TOKENS or LOCAL_LLM_TEMPERATURE to fit your workload. The Delegation Philosophy section explains what tasks are best suited for local delegation versus Claude's core capabilities.
Related MCP Servers
after-effects
MCP Server for Adobe After Effects. Enables remote control (compositions, text, shapes, solids, properties) via the Model Context Protocol using ExtendScript.
docmole
Dig through any documentation with AI - MCP server for Claude, Cursor, and other AI assistants
civitai
A Model Context Protocol server for browsing and discovering AI models on Civitai
Obsidian
MCP server for bidirectional knowledge flow between Claude Code and Obsidian vaults
playwright-wizard
MCP server providing Playwright test generation wizard with intelligent prompts and best practices
cogmemai
28 MCP tools that give Ai coding assistants persistent memory across sessions. Works with Claude Code, Cursor, Windsurf, Cline, and Continue. Cloud memory — perfect for teams.