local-stt
A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.
claude mcp add --transport stdio smartlittleapps-local-stt-mcp node path/to/local-stt-mcp/mcp-server/dist/index.js \ --env HF_TOKEN="your_token_here"
How to use
Local-stt is a local, privacy-first speech-to-text MCP server that runs entirely on your machine, optimized for Apple Silicon. It leverages whisper.cpp for fast transcription, with optional speaker diarization and automatic audio format conversion via ffmpeg. The server exposes a set of MCP tools such as transcribe, transcribe_long, transcribe_with_speakers, list_models, health_check, and version, enabling streamlined processing of audio inputs and retrieval of transcripts in multiple formats (txt, json, vtt, srt, csv). To begin, run the server using Node.js and point your MCP client to the server’s index.js entry, as shown in the installation steps. When using speaker diarization, supply a HuggingFace token ( HF_TOKEN ) to enable access to the diarization models.
Once the server is running, you can call tools like transcribe for standard transcription with automatic format conversion, transcribe_with_speakers for diarized transcripts with speaker labels, and transcribe_long for splitting long audio into manageable chunks. You can also query available models with list_models to see which whisper models are installed and supported, perform health checks with health_check to confirm system readiness, and fetch version information with version for debugging or compatibility checks.
How to install
Prerequisites
- Node.js 18+ installed on your system
- whisper.cpp installed (brew install whisper-cpp on macOS with Homebrew)
- ffmpeg installed for audio format conversions (brew install ffmpeg)
- Optional: HuggingFace token for speaker diarization (free token at huggingface.co)
Installation steps
- Clone the repository and navigate to the MCP server folder
git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
- Install dependencies and build
npm install
npm run build
- Download whisper models (if the project provides a script)
npm run setup:models
- Set up HuggingFace token if you plan to use speaker diarization
export HF_TOKEN="your_token_here" # Get a free token from huggingface.co
- Run the MCP server (as described in the configuration and client usage)
npm run start
Note: If you’re integrating with an MCP client, configure the client to point to the Node entry file, for example:
{
"mcpServers": {
"local-stt": {
"command": "node",
"args": ["path/to/local-stt-mcp/mcp-server/dist/index.js"]
}
}
}
Additional notes
Tips and caveats:
- For best Apple Silicon performance, ensure you use the optimized whisper.cpp setup and keep ffmpeg up to date.
- If diarization is not needed, you can omit HF_TOKEN and related diarization models to save resources.
- The server supports multiple output formats; specify the desired format in the MCP client requests via the tool parameters.
- If you encounter memory issues, verify you’re on a macOS system with sufficient free RAM, as whisper processing can be memory intensive depending on the model size.
- Ensure your environment paths are correctly resolved if running via custom paths; the dist/index.js file is the entry point after npm run build.
Related MCP Servers
xiaozhi-esp32 -java
小智ESP32的Java企业级管理平台,提供设备监控、音色定制、角色切换和对话记录管理的前后端及服务端一体化解决方案
Wax
Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File. Pure Swift
z-image-studio
A Cli, a webUI, and a MCP server for the Z-Image-Turbo text-to-image generation model (Tongyi-MAI/Z-Image-Turbo base model as well as quantized models)
furi
CLI & API for MCP management
bridge4simulator
An MCP (Model Context Protocol) server that enables AI assistants to control iOS Simulator. Seamlessly integrates with Claude Desktop, Cursor, Claude Code, and other MCP-compatible clients.
Pare
Dev tools, optimized for agents. Structured, token-efficient MCP servers for git, test runners, npm, Docker, and more.