llama-stack

Deploys Llama 3.2-3B on vLLM with Llama Stack and MCP servers in OpenShift AI.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio rh-ai-quickstart-llama-stack-mcp-server python -m custom_mcp_server \
  --env MCP_SERVER_PORT="8000" \
  --env MCP_SERVER_BASE_URL="http://localhost:8000"

How to use

This MCP server integration provides a custom Model Context Protocol (MCP) server that exposes enterprise tools to the Llama Stack. Specifically, the included custom MCP server provides Vacation Management capabilities, enabling the LLM to query vacation balances and create vacation requests via a standardized tool interface. The MCP server acts as a bridge between Llama Stack and the HR enterprise APIs, translating LLM tool calls into REST API requests and returning structured results that the Llama Stack can present in natural language responses. You’ll interact with the system by enabling the custom MCP server in your Llama Stack deployment, discovering its tools during startup, and then invoking those tools from your Llama Stack Playground or conversational UI as part of your workflows. The server is implemented in Python and is designed to be customized to connect with your own HR APIs, should you need to integrate additional endpoints beyond vacation management.

To use the tools, ensure the MCP server is reachable by the Llama Stack (as configured in your llama-stack configuration). At runtime, the Llama Stack will discover available tools via the MCP server’s capabilities and present them to the LLM. When you ask questions that require vacation information or actions, the LLM may decide to call the vacation-related tool (e.g., check balance or create a request). The MCP server will execute the corresponding REST calls to the HR API, return structured results, and the Llama Stack will incorporate those results into the final user-facing response.

How to install

Prerequisites:

A Kubernetes/OpenShift cluster (with kubectl/oc access)
Helm installed on your workstation
Access to the repository llama-stack-mcp-server (as provided by rh-ai-quickstart)
Docker or container tooling if you plan to build or push images locally

Step 1: Clone the repository bash git clone https://github.com/rh-ai-quickstart/llama-stack-mcp-server.git &&
cd llama-stack-mcp-server/

Step 2: Prepare your OpenShift project bash oc new-project llama-stack-mcp-demo

Step 3: Install prerequisites tools

Ensure OpenShift CLI (oc) and Helm are available in your PATH.
Optional: configure access to your container registry if you plan to push custom images.

Step 4: Build and deploy using the umbrella Helm chart

Set the device you intend to use (gpu, cpu, hpu) bash export DEVICE="gpu"
Build Helm dependencies for the charts bash helm dependency build ./helm/llama-stack-mcp
Deploy the complete stack with a single command bash helm install llama-stack-mcp ./helm/llama-stack-mcp --set device=$DEVICE
--set llama-stack.device=$DEVICE
--set llama3-2-3b.device=$DEVICE

Note: The llama-stack pod may show CrashLoopBackOff initially as the Llama model loads. This is expected until the model endpoint is ready.

Step 5: Verify deployment and access the Playground bash export PLAYGROUND_URL=$(oc get route llama-stack-playground -o jsonpath='{.spec.host}' 2>/dev/null || echo "Route not found") echo "Playground: https://$PLAYGROUND_URL"

Optional troubleshooting: bash oc get pods oc logs -l app.kubernetes.io/name=llama-stack oc logs -l app.kubernetes.io/name=custom-mcp-server

You should see all components running once initialization completes. Customize the HR REST endpoints and credentials in the custom-mcp-server as needed for your enterprise environment.

Additional notes

Tips and considerations:

Ensure the MCP server’s port and base URL match what you configure in mcp_config. The example uses port 8000; adjust as needed for your deployment.
If you customize the custom MCP server to point to your HR APIs, provide proper authentication (API keys, OAuth tokens) and secure transport (HTTPS).
When using OpenShift, the Helm umbrella chart (llama-stack-mcp) will manage all subcharts; if you need to tweak individual components, you can still enable/disable specific charts via values.
MCP tool discovery happens at startup; if a tool isn’t appearing, verify the MCP server is reachable, the service name/URL are correct in your llama-stack-config, and that the server exposes a valid tool schema.
For debugging, review logs from the custom-mcp-server and llama-stack pods; ensure the REST endpoints your HR API exposes are reachable from the cluster network.

Related MCP Servers

zen

1.1k

Selfhosted notes app. Single golang binary, notes stored as markdown within SQLite, full-text search, very low resource usage

MCP -Deepseek_R1

A Model Context Protocol (MCP) server implementation connecting Claude Desktop with DeepSeek's language models (R1/V3)

mcp-fhir

A Model Context Protocol implementation for FHIR

mcp

Inkdrop Model Context Protocol Server

mcp-appium-gestures

This is a Model Context Protocol (MCP) server providing resources and tools for Appium mobile gestures using Actions API..

dubco -npm

The (Unofficial) dubco-mcp-server enables AI assistants to manage Dub.co short links via the Model Context Protocol. It provides three MCP tools: create_link for generating new short URLs, update_link for modifying existing links, and delete_link for removing short links.