mcp-speech-to-text
🎙️ MCP Speech-to-Text Server with Enhanced Cantonese Support | Offline Vosk + Online Google Cloud | Auto-detection for zh-HK | n8n workflows | Hong Kong optimized 🇭🇰
claude mcp add --transport stdio michaelyuwh-mcp-speech-to-text docker compose up -d \ --env SPEECH_ENGINE="auto or vosk/google (default: auto)" \ --env MCP_SERVER_PORT="8000" \ --env VOSK_MODEL_PATH="/app/models"
How to use
This MCP server provides a fully local, offline speech-to-text capability optimized for production use on x86_64 Linux with Docker. It supports a primary offline engine using Vosk for fast, private transcription, with a fallback SpeechRecognition-based option for development or environments where Vosk is unavailable. The server exposes a set of MCP tools for transcription, audio processing, and environment introspection. You can transcribe pre-recorded audio files, record live audio from a microphone, or query supported engines. The available tools include transcribe_audio_offline, transcribe_audio_file, record_and_transcribe, get_supported_engines, convert_audio_format, test_microphone, and get_supported_formats. For production, you’ll typically run it via Docker Compose to deploy a multi-container setup that handles model loading, audio management, and API endpoints. Development and testing can be performed locally using the native Python routes or the uv-based development server on macOS.
How to install
Prerequisites:
- Docker and Docker Compose installed on your system
- Git to clone the repository
- Optional: uv (macOS) for development workflows
- Install prerequisites
- Docker: follow https://docs.docker.com/get-dstarted/
- Docker Compose is included with modern Docker installations
- Clone the repository
git clone https://github.com/michaelyuwh/mcp-speech-to-text.git
cd mcp-speech-to-text
- Run in production (Docker Compose)
# Start all services in detached mode
docker compose up -d
# View logs for troubleshooting
docker compose logs -f
- Run in development (native Python if you prefer)
# macOS (recommended for development): install uv
uv sync
uv run python -m mcp_speech_to_text
- Optional build and run (direct Docker)
./scripts/build-x86_64.sh
docker run -d --name mcp-speech mcp-speech-to-text:x86_64-latest
- Quick health checks
# Production health via script or endpoint health check
./scripts/test-deployment.sh
# Or run a quick inline check using provided server module
docker run --rm mcp-speech-to-text:latest python -c "from src.mcp_speech_to_text.server import OfflineSpeechToTextServer; server = OfflineSpeechToTextServer(); print('✅ Server healthy')"
Prerequisites recap:
- A machine capable of running Docker
- Basic knowledge of Docker Compose commands
- Optionally Python tooling for development (uv, pip) if you’re testing locally without Docker
Additional notes
Tips and common issues:
- Environment variables: SPEECH_ENGINE, VOSK_MODEL_PATH, and MCP_SERVER_PORT control which engine is used, where models are stored, and which port the MCP server listens on. Adjust via docker-compose.override.yml or your deployment environment as needed.
- Vosk model availability: Vosk-based transcription is the preferred production path on x86_64 Linux. If Vosk is unavailable on a host (e.g., macOS ARM), the server will fall back to SpeechRecognition in dev mode.
- Docker platform compatibility: For Apple Silicon (ARM), you may need to build for linux/amd64 explicitly using docker buildx or run with appropriate platform flags.
- Audio device access in Docker: If you test live microphone transcription, ensure the container has access to audio devices (e.g., docker run --device /dev/snd).
- Model management: VOSK_MODEL_PATH points to where models are stored. The repository auto-downloads models as needed; you can pre-populate /app/models to avoid download delays on first run.
- Logs and debugging: Use docker compose logs or docker logs <container> to diagnose issues related to model loading, port binding, or missing dependencies.
- Security: The setup is designed to run as non-root inside containers and to avoid external API calls, aligning with MCP’s offline/privacy goals.
Related MCP Servers
mysql_mcp_server
A Model Context Protocol (MCP) server that enables secure interaction with MySQL databases
minima
On-premises conversational RAG with configurable containers
PlexMCP-OSS
The MCP gateway platform. PlexMCP gives you a unified gateway to manage, orchestrate, and secure your MCP servers.
MCP-Plugin-dotnet
.NET MCP bridge: expose app methods/data as MCP tools, prompts, and resources via an in-app plugin + lightweight server (SignalR; stdio/http).
aiquila
Connect Claude AI to your Nextcloud via the Model Context Protocol. Browse, search, and manage files through natural conversation.
mcp-turso
MCP server for interacting with Turso-hosted LibSQL databases