multimodal-agents-course

An MCP Multimodal AI Agent with eyes and ears!

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio the-ai-merge-multimodal-agents-course node kubrick-mcp/server.js \
  --env GROQ_API_KEY="Your Groq API key (if required)" \
  --env OPIK_API_URL="URL for Opik API (if used)" \
  --env OPENAI_API_KEY="Your OpenAI API key"

How to use

Kubrick is an MCP-based multimodal processing platform designed to expose video, image, audio, and text processing capabilities as MCP resources, tools, and prompts. The server acts as the central hub that coordinates multimodal pipelines, integrates with LLMs/vision models, and exposes endpoints that agents can call via the MCP protocol. With this server, you can register resources (e.g., video indexing, image feature extraction, audio transcription), define prompts and tools for agents to use, and assemble pipelines that an agent can orchestrate at runtime. The included modules emphasize building a production-ready MCP server for video search and multimodal processing, with observability and versioning support through integrations like Opik. To use it, deploy the kubrick-mcp server, ensure your environment provides access to any required ML models or APIs, and connect your MCP clients to the server to start issuing prompts, resources, and tool invocations.

How to install

Prerequisites:

Node.js (recommended LTS) installed on your machine
Basic knowledge of MCP concepts (Resources, Prompts, Tools, and Agents)
Access to any APIs/models required by your pipelines (e.g., OpenAI, Groq) and corresponding API keys

Installation steps:

Clone the repository: git clone https://github.com/your-org/multimodal-agents-course.git cd multimodal-agents-course
Install dependencies for the MCP server (example for Node.js-based server): npm install
Configure environment variables (create a .env file or export variables): OPENAI_API_KEY=your-openai-api-key GROQ_API_KEY=your-groq-api-key # if using Groq OPIK_API_URL=https://api.opik.example # if using Opik
Start the MCP server: npm run start # or node kubrick-mcp/server.js depending on the setup
Verify the server is running by visiting the local endpoint or checking the console logs for MCP readiness.

Notes:

If your setup uses a Python backend instead of Node, adjust the commands accordingly (e.g., python -m kubrick_mcp.server).
Ensure any required models or services (e.g., video processing pipelines, LLM endpoints) are accessible from the runtime environment.

Additional notes

Tips and common issues:

Ensure your API keys and endpoints are correctly configured; missing credentials are a common startup failure.
If you encounter port conflicts, change the default MCP server port in the configuration.
For multimodal pipelines, verify that all referenced resources (video processors, image/vision models, audio processing components) are installed and accessible.
Use Opik or similar observability tools to version prompts and monitor prompt/response history for debugging.
When debugging, run in a local development mode to inspect MCP inspector outputs and traces before deploying to production.
If using Docker or containerized deployments, ensure appropriate resource limits (CPU/GPU) are set for video and model workloads.

Related MCP Servers

python -client

154

支持查询主流agent框架技术文档的MCP server（支持stdio和sse两种传输协议）, 支持 langchain、llama-index、autogen、agno、openai-agents-sdk、mcp-doc、camel-ai 和 crew-ai

ncp

Natural Context Provider (NCP). Your MCPs, supercharged. Find any tool instantly, load on demand, run on schedule, ready for any client. Smart loading saves tokens and energy.

zerodha

Zerodha MCP Server & Client - AI Agent (w/Agno & w/Google ADK)

openai -agent-dotnet

Sample to create an AI Agent using OpenAI models with any MCP server running on Azure Container Apps

fast -telegram

Telegram MCP Server and HTTP-MTProto bridge | Multi-user auth, intelligent search, file sending, web setup | Docker & PyPI ready

ia-na-pratica

IA na Prática: LLM, RAG, MCP, Agents, Function Calling, Multimodal, TTS/STT e mais