lemonade
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
claude mcp add --transport stdio lemonade-sdk-lemonade python -m lemonade_server run Gemma-3-4b-it-GGUF \ --env LEMONADE_CONFIG="path/to/config.json (optional)"
How to use
Lemonade provides a local inference platform that hosts and serves optimized models (LLMs), image generation, and speech generation directly on your machine. The Lemonade CLI exposes a range of commands to run pre-bundled models, browse and pull models, and manage available backends. Typical workflows include starting the server to access a web UI or API endpoints, listing available models, and pulling specific models for offline use. You can run chat with Gemma or generate images and speech via the built-in interfaces. The CLI supports commands like run to start a specific model, list to view available models, pull to download a model, and recipes to inspect your local backends. This makes it easy to experiment with local, GPU-accelerated inference and integrate Lemonade into other tools and apps.
How to install
Prerequisites:
- A supported OS (Linux, Windows, macOS) and a modern Python 3.10–3.13 environment
- Optional: GPU support drivers if you plan to use GPU acceleration
Install steps:
- Install Python and pip (if not already installed).
- Create a virtual environment (recommended):
- python -m venv lemonade-venv
- source lemonade-venv/bin/activate (Linux/macOS)
- lemonade-venv\Scripts\activate (Windows)
- Install the Lemonade server package (example using pipx or pip):
- Using pip: pip install lemonade-server
- Or using pipx (isolated env): pipx install lemonade-server
- Verify installation:
- lemonade-server --version
- Run Lemonade with a model (example from README):
- lemonade-server run Gemma-3-4b-it-GGUF
- Optional: configure environment variables or supply a config file as needed for your environment.
Additional notes
Tips and notes:
- The Lemonade CLI can pull models locally via lemonade-server pull <model-id> and list models with lemonade-server list.
- For offline or repeatable environments, consider using Model Manager to download and cache models ahead of time.
- If GPU acceleration is used, ensure the appropriate drivers and CUDA toolkit are installed and compatible with your hardware.
- Environment variables (such as LEMONADE_CONFIG) can be used to customize paths, model caches, or backend settings. Review the docs for available options.
- If you encounter port or binding issues, check that no other process is occupying the required port and that firewall rules permit local access.
Related MCP Servers
AstrBot
Agentic IM Chatbot infrastructure that integrates lots of IM platforms, LLMs, plugins and AI feature, and can be your openclaw alternative. ✨
sec-edgar
A SEC EDGAR MCP (Model Context Protocol) Server
openapi
OpenAPI definitions, converters and LLM function calling schema composer.
gtm
An MCP server for Google Tag Manager. Connect it to your LLM, authenticate once, and start managing GTM through natural language.
MCP-Stack-for-UI-UX-Designers
An end-to-end Model Context Protocol (MCP) solution that streamlines the entire UI/UX design workflow - from gathering inspiration to development handoff. This suite integrates multiple specialized MCP servers to automate and enhance the design process through AI assistance.
packt-netops-ai-workshop
🔧 Build Intelligent Networks with AI