Get the FREE Ultimate OpenClaw Setup Guide →

mcp -litellm

MCP server from OpenCnid/mcp-server-litellm

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio opencnid-mcp-server-litellm python -m mcp_server_litellm \
  --env LITELLM_CACHE="Path to cache directory for LiteLLM (optional)" \
  --env LITELLM_MODEL="Model to use with LiteLLM (e.g., 'gpt-3.5-turbo')" \
  --env OPENAI_API_KEY="OpenAI API key if using OpenAI models via LiteLLM"

How to use

This MCP server integrates LiteLLM to handle text completion requests by routing them through LiteLLM, which can leverage OpenAI models when configured. Once started, the litellm MCP endpoint can be used by clients to request text completions, with LiteLLM managing model selection, batching, and rate limits behind the scenes. Typical usage involves sending a completion prompt to the MCP service and receiving a structured completion response. If you provide an OpenAI API key, LiteLLM can proxy requests to OpenAI models as configured, enabling advanced capabilities like streaming tokens and system prompts. Ensure the environment variables are set for your desired model and API access before starting the server.

How to install

Prerequisites:

  • Python 3.8+ installed on your system
  • Pip available in your environment

Install the MCP server package:

pip install mcp-server-litellm

Optionally create and activate a virtual environment:

python -m venv venv
# Windows
venv\Scripts\activate.bat
# Unix/macOS
source venv/bin/activate

Configure and run the MCP server (see mcp_config for options):

# Example using the default module invocation via -m
python -m mcp_server_litellm

Ensure required environment variables are set (see mcp_config).

Additional notes

Notes:

  • If you intend to use OpenAI models, you must provide OPENAI_API_KEY in the environment. LiteLLM can route requests to OpenAI or operate with local/alternative backends depending on configuration.
  • Set LITELLM_MODEL to select the desired model (e.g., a specific GPT-3.5/4 variant) if supported by your LiteLLM installation.
  • Monitor resource usage (CPU/GPU, memory) when using large models and enable batching if supported for throughput.
  • If you encounter authentication or connectivity issues with the OpenAI API, verify API keys and network access from the host running the MCP server.
  • Use the MCP tooling to validate prompts, tune parameters (temperature, max_tokens), and handle streaming responses if supported by LiteLLM.
  • Ensure you are using compatible versions of LiteLLM, the MCP server package, and Python dependencies.

Related MCP Servers

Sponsor this space

Reach thousands of developers