voicemode
Natural voice conversations with Claude Code
claude mcp add --transport stdio mbailey-voicemode uvx mbailey/voicemode \ --env OPENAI_API_KEY="your-openai-key (if using cloud services)"
How to use
VoiceMode provides an MCP-compatible server that enables natural voice conversations with Claude Code and other MCP-capable agents. This server uses local or cloud-backed AI services to handle speech-to-text, intent management, and text-to-speech, making it possible to start conversations via MCP clients and Claude Code workflows. Once registered as an MCP service, you can invoke voice-driven conversations, switch between local and cloud backends, and configure privacy options to run entirely locally or with remote services. The setup emphasizes low latency, smart silence detection, and optional local audio services (Whisper for STT and Kokoro for TTS) to maximize privacy and offline capability.
Using the MCP integration, you can point Claude Code or other MCP clients at this voicemode MCP, start a conversation, and the agent will transcribe your speech, generate a response, and speak it back through your audio output, all in a seamless loop. The server also exposes configuration and permissions controls, allowing you to tailor which capabilities are available and how the system handles permissions prompts.
How to install
Prerequisites:
- A computer with a supported OS (Linux, macOS, Windows via WSL)
- curl or a shell to install uv/uvx (Astral UV)
- Python environment if you plan to use local Python tooling (optional for UV-based workflow)
Installation steps:
- Install the UV package manager (if needed):
curl -LsSf https://astral.sh/uv/install.sh | sh
- Install the VoiceMode MCP server via UVX (using the package name for the voicemode project):
uvx mbailey/voicemode
- Register the MCP server with your Claude Code or MCP client (example for Claude Code):
claude mcp add --scope user voicemode -- uvx --refresh voice-mode
- Optional: set up your OpenAI API key if using cloud services:
export OPENAI_API_KEY=your-openai-key
- Start or connect to the VoiceMode MCP service as appropriate for your setup. If your environment requires an explicit start script, consult the project documentation for the exact command to launch the MCP endpoint.
Prerequisites for local voice services (optional):
- Whisper (for STT) and Kokoro (for TTS) if you want offline processing. See the VoiceMode documentation for installation steps per platform.
Additional notes
Tips and common considerations:
- OpenAI API key: If you plan to use cloud-based models, ensure OPENAI_API_KEY is set in your environment.
- Privacy settings: You can run entirely locally with Whisper/Kokoro or mix with cloud services for higher quality models.
- Local voice services: Whisper.cpp and Kokoro provide offline or private processing paths; configure these in voicemode config if privacy is critical.
- Permissions: If you want to avoid permission prompts, you can extend your Claude Code permissions as described in the Permissions Guide.
- System dependencies: Ensure your platform has audio input/output permissions and that PulseAudio or equivalent is correctly configured on Linux/WSL.
- Troubleshooting: If UV/uvx isn’t found, rerun the UV installation script, or verify PATH contains the uvx binary. Check microphone access if speech isn't detected correctly.
Related MCP Servers
boilerplate
TypeScript Model Context Protocol (MCP) server boilerplate providing IP lookup tools/resources. Includes CLI support and extensible structure for connecting AI systems (LLMs) to external data sources like ip-api.com. Ideal template for creating new MCP integrations via Node.js.
cplusplus_mcp
An MCP (Model Context Protocol) server for analyzing C++ codebases using libclang.
spec-kit
MCP server enabling AI assistants to use GitHub's spec-kit methodology
israel-drugs
MCP server from DavidOsherdiagnostica/israel-drugs-mcp-server
create -kit
Scaffold a production-ready Model Context Protocol (MCP) server in seconds.
storybook
MCP server for Storybook - provides AI assistants access to components, stories, properties and screenshots. Built with TypeScript and Model Context Protocol SDK.