lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio lemonade-sdk-lemonade python -m lemonade_server run Gemma-3-4b-it-GGUF \
  --env LEMONADE_CONFIG="path/to/config.json (optional)"

How to use

Lemonade provides a local inference platform that hosts and serves optimized models (LLMs), image generation, and speech generation directly on your machine. The Lemonade CLI exposes a range of commands to run pre-bundled models, browse and pull models, and manage available backends. Typical workflows include starting the server to access a web UI or API endpoints, listing available models, and pulling specific models for offline use. You can run chat with Gemma or generate images and speech via the built-in interfaces. The CLI supports commands like run to start a specific model, list to view available models, pull to download a model, and recipes to inspect your local backends. This makes it easy to experiment with local, GPU-accelerated inference and integrate Lemonade into other tools and apps.

How to install

Prerequisites:

A supported OS (Linux, Windows, macOS) and a modern Python 3.10–3.13 environment
Optional: GPU support drivers if you plan to use GPU acceleration

Install steps:

Install Python and pip (if not already installed).
Create a virtual environment (recommended):
- python -m venv lemonade-venv
- source lemonade-venv/bin/activate (Linux/macOS)
- lemonade-venv\Scripts\activate (Windows)
Install the Lemonade server package (example using pipx or pip):
- Using pip: pip install lemonade-server
- Or using pipx (isolated env): pipx install lemonade-server
Verify installation:
- lemonade-server --version
Run Lemonade with a model (example from README):
- lemonade-server run Gemma-3-4b-it-GGUF
Optional: configure environment variables or supply a config file as needed for your environment.

Additional notes

Tips and notes:

The Lemonade CLI can pull models locally via lemonade-server pull <model-id> and list models with lemonade-server list.
For offline or repeatable environments, consider using Model Manager to download and cache models ahead of time.
If GPU acceleration is used, ensure the appropriate drivers and CUDA toolkit are installed and compatible with your hardware.
Environment variables (such as LEMONADE_CONFIG) can be used to customize paths, model caches, or backend settings. Review the docs for available options.
If you encounter port or binding issues, check that no other process is occupying the required port and that firewall rules permit local access.

Related MCP Servers

AstrBot

18.5k

Agentic IM Chatbot infrastructure that integrates lots of IM platforms, LLMs, plugins and AI feature, and can be your openclaw alternative. ✨

sec-edgar

214

A SEC EDGAR MCP (Model Context Protocol) Server

openapi

123

OpenAPI definitions, converters and LLM function calling schema composer.

gtm

An MCP server for Google Tag Manager. Connect it to your LLM, authenticate once, and start managing GTM through natural language.

MCP-Stack-for-UI-UX-Designers

An end-to-end Model Context Protocol (MCP) solution that streamlines the entire UI/UX design workflow - from gathering inspiration to development handoff. This suite integrates multiple specialized MCP servers to automate and enhance the design process through AI assistance.

packt-netops-ai-workshop

🔧 Build Intelligent Networks with AI