Get the FREE Ultimate OpenClaw Setup Guide →

lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio lemonade-sdk-lemonade python -m lemonade_server run Gemma-3-4b-it-GGUF \
  --env LEMONADE_CONFIG="path/to/config.json (optional)"

How to use

Lemonade provides a local inference platform that hosts and serves optimized models (LLMs), image generation, and speech generation directly on your machine. The Lemonade CLI exposes a range of commands to run pre-bundled models, browse and pull models, and manage available backends. Typical workflows include starting the server to access a web UI or API endpoints, listing available models, and pulling specific models for offline use. You can run chat with Gemma or generate images and speech via the built-in interfaces. The CLI supports commands like run to start a specific model, list to view available models, pull to download a model, and recipes to inspect your local backends. This makes it easy to experiment with local, GPU-accelerated inference and integrate Lemonade into other tools and apps.

How to install

Prerequisites:

  • A supported OS (Linux, Windows, macOS) and a modern Python 3.10–3.13 environment
  • Optional: GPU support drivers if you plan to use GPU acceleration

Install steps:

  1. Install Python and pip (if not already installed).
  2. Create a virtual environment (recommended):
    • python -m venv lemonade-venv
    • source lemonade-venv/bin/activate (Linux/macOS)
    • lemonade-venv\Scripts\activate (Windows)
  3. Install the Lemonade server package (example using pipx or pip):
    • Using pip: pip install lemonade-server
    • Or using pipx (isolated env): pipx install lemonade-server
  4. Verify installation:
    • lemonade-server --version
  5. Run Lemonade with a model (example from README):
    • lemonade-server run Gemma-3-4b-it-GGUF
  6. Optional: configure environment variables or supply a config file as needed for your environment.

Additional notes

Tips and notes:

  • The Lemonade CLI can pull models locally via lemonade-server pull <model-id> and list models with lemonade-server list.
  • For offline or repeatable environments, consider using Model Manager to download and cache models ahead of time.
  • If GPU acceleration is used, ensure the appropriate drivers and CUDA toolkit are installed and compatible with your hardware.
  • Environment variables (such as LEMONADE_CONFIG) can be used to customize paths, model caches, or backend settings. Review the docs for available options.
  • If you encounter port or binding issues, check that no other process is occupying the required port and that firewall rules permit local access.

Related MCP Servers

Sponsor this space

Reach thousands of developers