mcp-speech-to-text

🎙️ MCP Speech-to-Text Server with Enhanced Cantonese Support | Offline Vosk + Online Google Cloud | Auto-detection for zh-HK | n8n workflows | Hong Kong optimized 🇭🇰

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio michaelyuwh-mcp-speech-to-text docker compose up -d \
  --env SPEECH_ENGINE="auto or vosk/google (default: auto)" \
  --env MCP_SERVER_PORT="8000" \
  --env VOSK_MODEL_PATH="/app/models"

How to use

This MCP server provides a fully local, offline speech-to-text capability optimized for production use on x86_64 Linux with Docker. It supports a primary offline engine using Vosk for fast, private transcription, with a fallback SpeechRecognition-based option for development or environments where Vosk is unavailable. The server exposes a set of MCP tools for transcription, audio processing, and environment introspection. You can transcribe pre-recorded audio files, record live audio from a microphone, or query supported engines. The available tools include transcribe_audio_offline, transcribe_audio_file, record_and_transcribe, get_supported_engines, convert_audio_format, test_microphone, and get_supported_formats. For production, you’ll typically run it via Docker Compose to deploy a multi-container setup that handles model loading, audio management, and API endpoints. Development and testing can be performed locally using the native Python routes or the uv-based development server on macOS.

How to install

Prerequisites:

Docker and Docker Compose installed on your system
Git to clone the repository
Optional: uv (macOS) for development workflows

Install prerequisites

Docker: follow https://docs.docker.com/get-dstarted/
Docker Compose is included with modern Docker installations

Clone the repository

git clone https://github.com/michaelyuwh/mcp-speech-to-text.git
cd mcp-speech-to-text

Run in production (Docker Compose)

# Start all services in detached mode
docker compose up -d

# View logs for troubleshooting
docker compose logs -f

Run in development (native Python if you prefer)

# macOS (recommended for development): install uv
uv sync
uv run python -m mcp_speech_to_text

Optional build and run (direct Docker)

./scripts/build-x86_64.sh
docker run -d --name mcp-speech mcp-speech-to-text:x86_64-latest

Quick health checks

# Production health via script or endpoint health check
./scripts/test-deployment.sh

# Or run a quick inline check using provided server module
docker run --rm mcp-speech-to-text:latest python -c "from src.mcp_speech_to_text.server import OfflineSpeechToTextServer; server = OfflineSpeechToTextServer(); print('✅ Server healthy')"

Prerequisites recap:

A machine capable of running Docker
Basic knowledge of Docker Compose commands
Optionally Python tooling for development (uv, pip) if you’re testing locally without Docker

Additional notes

Tips and common issues:

Environment variables: SPEECH_ENGINE, VOSK_MODEL_PATH, and MCP_SERVER_PORT control which engine is used, where models are stored, and which port the MCP server listens on. Adjust via docker-compose.override.yml or your deployment environment as needed.
Vosk model availability: Vosk-based transcription is the preferred production path on x86_64 Linux. If Vosk is unavailable on a host (e.g., macOS ARM), the server will fall back to SpeechRecognition in dev mode.
Docker platform compatibility: For Apple Silicon (ARM), you may need to build for linux/amd64 explicitly using docker buildx or run with appropriate platform flags.
Audio device access in Docker: If you test live microphone transcription, ensure the container has access to audio devices (e.g., docker run --device /dev/snd).
Model management: VOSK_MODEL_PATH points to where models are stored. The repository auto-downloads models as needed; you can pre-populate /app/models to avoid download delays on first run.
Logs and debugging: Use docker compose logs or docker logs <container> to diagnose issues related to model loading, port binding, or missing dependencies.
Security: The setup is designed to run as non-root inside containers and to avoid external API calls, aligning with MCP’s offline/privacy goals.

Related MCP Servers

mysql_mcp_server

1.1k

A Model Context Protocol (MCP) server that enables secure interaction with MySQL databases

minima

1.0k

On-premises conversational RAG with configurable containers

PlexMCP-OSS

The MCP gateway platform. PlexMCP gives you a unified gateway to manage, orchestrate, and secure your MCP servers.

MCP-Plugin-dotnet

.NET MCP bridge: expose app methods/data as MCP tools, prompts, and resources via an in-app plugin + lightweight server (SignalR; stdio/http).

aiquila

Connect Claude AI to your Nextcloud via the Model Context Protocol. Browse, search, and manage files through natural conversation.

mcp-turso

MCP server for interacting with Turso-hosted LibSQL databases