Get the FREE Ultimate OpenClaw Setup Guide →

mcp-speech-to-text

🎙️ MCP Speech-to-Text Server with Enhanced Cantonese Support | Offline Vosk + Online Google Cloud | Auto-detection for zh-HK | n8n workflows | Hong Kong optimized 🇭🇰

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio michaelyuwh-mcp-speech-to-text docker compose up -d \
  --env SPEECH_ENGINE="auto or vosk/google (default: auto)" \
  --env MCP_SERVER_PORT="8000" \
  --env VOSK_MODEL_PATH="/app/models"

How to use

This MCP server provides a fully local, offline speech-to-text capability optimized for production use on x86_64 Linux with Docker. It supports a primary offline engine using Vosk for fast, private transcription, with a fallback SpeechRecognition-based option for development or environments where Vosk is unavailable. The server exposes a set of MCP tools for transcription, audio processing, and environment introspection. You can transcribe pre-recorded audio files, record live audio from a microphone, or query supported engines. The available tools include transcribe_audio_offline, transcribe_audio_file, record_and_transcribe, get_supported_engines, convert_audio_format, test_microphone, and get_supported_formats. For production, you’ll typically run it via Docker Compose to deploy a multi-container setup that handles model loading, audio management, and API endpoints. Development and testing can be performed locally using the native Python routes or the uv-based development server on macOS.

How to install

Prerequisites:

  • Docker and Docker Compose installed on your system
  • Git to clone the repository
  • Optional: uv (macOS) for development workflows
  1. Install prerequisites
  1. Clone the repository
git clone https://github.com/michaelyuwh/mcp-speech-to-text.git
cd mcp-speech-to-text
  1. Run in production (Docker Compose)
# Start all services in detached mode
docker compose up -d

# View logs for troubleshooting
docker compose logs -f
  1. Run in development (native Python if you prefer)
# macOS (recommended for development): install uv
uv sync
uv run python -m mcp_speech_to_text
  1. Optional build and run (direct Docker)
./scripts/build-x86_64.sh
docker run -d --name mcp-speech mcp-speech-to-text:x86_64-latest
  1. Quick health checks
# Production health via script or endpoint health check
./scripts/test-deployment.sh

# Or run a quick inline check using provided server module
docker run --rm mcp-speech-to-text:latest python -c "from src.mcp_speech_to_text.server import OfflineSpeechToTextServer; server = OfflineSpeechToTextServer(); print('✅ Server healthy')" 

Prerequisites recap:

  • A machine capable of running Docker
  • Basic knowledge of Docker Compose commands
  • Optionally Python tooling for development (uv, pip) if you’re testing locally without Docker

Additional notes

Tips and common issues:

  • Environment variables: SPEECH_ENGINE, VOSK_MODEL_PATH, and MCP_SERVER_PORT control which engine is used, where models are stored, and which port the MCP server listens on. Adjust via docker-compose.override.yml or your deployment environment as needed.
  • Vosk model availability: Vosk-based transcription is the preferred production path on x86_64 Linux. If Vosk is unavailable on a host (e.g., macOS ARM), the server will fall back to SpeechRecognition in dev mode.
  • Docker platform compatibility: For Apple Silicon (ARM), you may need to build for linux/amd64 explicitly using docker buildx or run with appropriate platform flags.
  • Audio device access in Docker: If you test live microphone transcription, ensure the container has access to audio devices (e.g., docker run --device /dev/snd).
  • Model management: VOSK_MODEL_PATH points to where models are stored. The repository auto-downloads models as needed; you can pre-populate /app/models to avoid download delays on first run.
  • Logs and debugging: Use docker compose logs or docker logs <container> to diagnose issues related to model loading, port binding, or missing dependencies.
  • Security: The setup is designed to run as non-root inside containers and to avoid external API calls, aligning with MCP’s offline/privacy goals.

Related MCP Servers

Sponsor this space

Reach thousands of developers