Get the FREE Ultimate OpenClaw Setup Guide →

mcp -whisper

An MCP Server for audio transcription using OpenAI

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio arcaputo3-mcp-server-whisper uv run mcp-server-whisper \
  --env OPENAI_API_KEY="<your_openai_api_key_here>" \
  --env AUDIO_FILES_PATH="<path_to_audio_files_here>"

How to use

MCP Server Whisper provides a comprehensive audio transcription and processing suite built around the MCP (Model Context Protocol). It enables you to search, convert, compress, transcribe, and analyze audio files using OpenAI’s Whisper and GPT-4o models, with support for multi-model transcription, interactive audio chat, and advanced prompts. Tools exposed by the server cover the full lifecycle from locating audio assets to generating enriched transcripts and TTS outputs, all while maintaining type-safe responses via Pydantic models. You can leverage parallel processing to run multiple tools concurrently, and you can customize prompts, formats, and timing metadata to suit your use case.

Key capabilities include: listing and filtering audio files with regex, size, duration, and format criteria; converting and compressing audio to supported formats; transcription using whisper-1 or GPT-4o-based transcription modes; interactive audio analysis with GPT-4o audio models; enhanced transcription templates for detailed, storytelling, professional, or analytical outputs; and text-to-speech generation with multiple voices. The exposed tools return structured results with metadata such as duration, file size, format, and timestamps to enable downstream automation and reporting.

To use the tools, run the MCP server via the provided MCP configuration (for local Claude workflows you typically load environment variables and launch through the MCP runner). The project includes an example configuration (.mcp.json) that wires the local environment to the Whisper server, and a set of commands you can invoke through the MCP interface to perform tasks like list_audio_files, get_latest_audio, transcribe_audio, transcribe_with_enhancement, create_audio, and more. The system is designed for parallelism, so you can request multiple transcriptions or analyses simultaneously and receive typed, structured results.

How to install

Prerequisites:

  • Python 3.10 or newer
  • Git
  • Node.js/npm not required (this is a Python/uv-based server), but uv is used as the runtime in the MCP setup
  • Access to OpenAI API (OPENAI_API_KEY) with appropriate quotas for Whisper and GPT-4o

Installation steps:

  1. Clone the repository: git clone https://github.com/arcaputo3/mcp-server-whisper.git cd mcp-server-whisper

  2. Install uv (Python runtime) if not already installed globally. The recommended workflow in this project uses uv to run the server:

    • Ensure you have Python installed
    • Install uv if needed (project-specific guidance may vary; typically via your environment's package manager)
  3. Set up environment variables:

    • Copy the example env file and populate keys: cp .env.example .env

      Edit .env to include your keys and paths

    • Required variables (examples): OPENAI_API_KEY=your_openai_api_key AUDIO_FILES_PATH=/path/to/your/audio/files
  4. Install pre-commit hooks (optional but recommended): uv run pre-commit install

  5. Run the MCP server (local development): uv sync

  6. If you’re using the provided local Claude workflow, ensure the .mcp.json file is configured to point at the Whisper server (as shown in the README) and that your environment variables are loaded at runtime (e.g., via dotenv-cli or your shell startup).

Notes:

  • The exact commands to install and run may vary slightly based on your environment and the uv version. Refer to the project’s CI/CD workflow badge and docs for any environment-specific steps.

Additional notes

Tips and frequently asked details:

  • Environment variables must be available at runtime. For local development with Claude, consider loading them with a tool like dotenv-cli when starting Claude.
  • The server supports automatic file compression for oversized inputs to comply with API limits; ensure your AUDIO_FILES_PATH points to your audio assets.
  • The example local configuration uses uv to run mcp-server-whisper; you can adapt the mcp.json for different environments (npx, docker, node) if needed.
  • For large-scale use, take advantage of MCP’s native parallel processing to run multiple transcriptions or analyses concurrently.
  • If you encounter API rate limits, consider staggering requests or increasing your OpenAI API quota. Ensure OPENAI_API_KEY has the necessary permissions for Whisper and GPT-4o services.
  • Valid formats for transcription outputs can be tuned via the transcription tool options, including timestamped outputs and JSON payloads.

Related MCP Servers

Sponsor this space

Reach thousands of developers