Get the FREE Ultimate OpenClaw Setup Guide →

guidance-for-scalable-model-inference-and-agentic-ai-on-amazon-eks

Comprehensive, scalable ML inference architecture using Amazon EKS, leveraging Graviton processors for cost-effective CPU-based inference and GPU instances for accelerated inference. Guidance provides a complete end-to-end platform for deploying LLMs with agentic AI capabilities, including RAG and MCP

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio aws-solutions-library-samples-guidance-for-scalable-model-inference-and-agentic-ai-on-amazon-eks docker run -i guidance-for-scalable-model-inference-and-agentic-ai-on-amazon-eks \
  --env AWS_REGION="us-east-1 (or your region)" \
  --env AWS_ROLE_ARN="arn:aws:iam::123456789012:role/YourEKSRole" \
  --env LOGGING_SERVICE="Langfuse or your observability endpoint" \
  --env EKS_CLUSTER_NAME="your-eks-cluster" \
  --env LANGUAGE_MODEL_PROVIDER="Bedrock/OpenSearch/LLM providers as per deployment"

How to use

This MCP server provides a guidance-driven, scalable model inference and agentic AI platform designed to run on Amazon EKS. It orchestrates CPU-based inference on Graviton instances and GPU-accelerated inference for high-throughput workloads, using a combination of Ray Serve, LiteLLM, vLLM, and Karpenter for elastic resource provisioning. The architecture enables Retrieval-Augmented Generation (RAG), Intelligent Document Processing (IDP), and multi-agent workflows, with observability through Langfuse and Prometheus/Grafana. You can deploy end-to-end inference pipelines, expose a unified API gateway for multiple models, and route requests to the most suitable compute tier based on workload characteristics. The platform also supports embedded reasoning, document retrieval, and web search fallbacks to maintain up-to-date knowledge while delivering coherent responses.

How to install

Prerequisites:

  • An AWS account with an EKS-ready environment
  • Docker installed on the deployment host
  • Access to the repository containing the MCP server configuration
  • Optional: Langfuse, Prometheus, and Grafana for observability

Installation steps:

  1. Install Docker on your machine (if not already installed):
# macOS / Windows
- Download Docker Desktop from https://www.docker.com/products/docker-desktop
# Linux (example for Debian-based systems)
sudo apt-get update
sudo apt-get install -y docker.io
sudo systemctl enable --now docker
  1. Pull and run the MCP server container (as defined in mcp_config).
docker run -i guidance-for-scalable-model-inference-and-agentic-ai-on-amazon-eks
  1. Set required environment variables to match your AWS/EKS setup. Example:
export AWS_REGION=us-east-1
export AWS_ROLE_ARN=arn:aws:iam::123456789012:role/YourEKSRole
export EKS_CLUSTER_NAME=your-eks-cluster
export LANGUAGE_MODEL_PROVIDER=Bedrock
  1. Verify the server is running and reachable at the configured endpoint. Use your API gateway (LiteLLM proxy) URL or the container's exposed port to send test requests.

  2. Optional: Configure observability.

# Start Prometheus/Grafana and Langfuse as per your cluster setup
  1. Deploy any additional components (RAG, OpenSearch, vLLM workers) according to your deployment manifest or Helm charts if provided in the repository.

Additional notes

Tips and notes:

  • Ensure your AWS IAM roles and permissions allow EKS, OpenSearch, and Ray/RLLM components to access necessary resources.
  • If using GPU nodes, confirm NVIDIA drivers and CUDA toolkit compatibility for your container images.
  • Monitor costs across EKS, EC2 (Graviton/ GPU), and OpenSearch; consider adjusting Karpenter settings for tighter autoscaling.
  • The guidance relies on a combination of multiple services (RAG, IDP, multi-agent orchestration); verify network policies and security groups permit proper communication among components.
  • If you encounter deployment issues, check the container logs for model loading errors, OpenSearch connectivity, and gateway routing configuration.
  • For environment variables, avoid committing secrets; use secure secret management in your deployment pipeline.

Related MCP Servers

Sponsor this space

Reach thousands of developers