multimodal -client
A Multi-modal MCP client for voice powered agentic workflows
claude mcp add --transport stdio ejb503-multimodal-mcp-client npx -y multimodal-mcp-client \ --env VITE_GEMINI_API_KEY="Gemini API key placeholder" \ --env VITE_SYSTEMPROMPT_API_KEY="Systemprompt API key placeholder"
How to use
The multimodal MCP client acts as the frontend interface that connects to MCP servers to orchestrate voice-driven, multimodal AI workflows. It enables you to interact with AI systems using natural speech, while supporting text and visual inputs through a unified MCP-based tooling system. The client works with both Systemprompt MCP servers (pre-configured and installable via the UX) and custom MCP servers you provide in a local mcp.config.custom.json. After configuring authentication keys and the MCP server you want to use, you launch the development server to start speaking to your AI workflows and invoke tools defined by the MCP server.
Within this client, you can configure and run custom tools and workflows exposed by MCP servers, manage state across multi-step interactions, and leverage the built-in multimodal capabilities to process voice, text, and visuals in real time. The tooling system lets you extend AI capabilities by adding domain-specific operations, while the voice-first design makes it practical to control complex AI pipelines using natural language commands and prompts. The client is designed to be developer-friendly, with TypeScript support, hot module replacement, and a modern tech stack to streamline integration with your own MCP servers.
How to install
Prerequisites:
- Node.js 16.x or higher
- npm 7.x or higher
- A modern browser with Web Speech API support
Install and run locally:
- Clone the repository
git clone https://github.com/Ejb503/multimodal-mcp-client.git
cd multimodal-mcp-client
- Install dependencies
npm install
- Install proxy dependencies (if applicable)
cd proxy
npm install
- Create and configure local MCP server config
# Navigate to config directory
cd config
# Create or copy local configuration
cp mcp.config.example.json mcp.config.custom.json
- Add API keys to environment (example keys)
# In .env or your environment, with VITE_ prefix to expose to MCP
VITE_GEMINI_API_KEY=your-gemini-api-key
VITE_SYSTEMPROMPT_API_KEY=your-systemprompt-api-key
- Start the development server
npm run dev
- Open the app in your browser at http://localhost:5173
Additional notes
Tips and common issues:
- Ensure your API keys are properly set with the VITE_ prefix so they are exposed to the MCP client and server.
- If you’re using a custom MCP server, place its config in config/mcp.config.custom.json and ensure the path to any local server entry (e.g., index.js) is correct.
- The client relies on Web Speech API; some browsers or environments may require permissions or additional setup to enable voice input.
- When using the npx-based config, you can replace the package-name with your actual MCP client package name if you publish it under a different registry name.
- If you encounter port conflicts, verify that the development server port (default 5173) is not used by another process.
- For production deployments, consider bundling with the appropriate build command and hosting the static assets with a suitable web server.
Related MCP Servers
gemini -tool
MCP server that enables AI assistants to interact with Google Gemini CLI, leveraging Gemini's massive token window for large file analysis and codebase understanding
obsidian -tools
Add Obsidian integrations like semantic search and custom Templater prompts to Claude or any MCP client.
ironcurtain
A secure* runtime for autonomous AI agents. Policy from plain-English constitutions. (*https://ironcurtain.dev)
CanvasMCPClient
Canvas MCP Client is an open-source, self-hostable dashboard application built around an infinite, zoomable, and pannable canvas. It provides a unified interface for interacting with multiple MCP (Model Context Protocol) servers through a flexible, widget-based system.
Alph
Universal MCP Server Configuration Manager
mcp-json-yaml-toml
A structured data reader and writer like 'jq' and 'yq' for AI Agents