Get the FREE Ultimate OpenClaw Setup Guide →
g

Voice Recognition

Verified

@gykdly

npx machina-cli add skill @gykdly/voice-recognition --openclaw
Files (1)
SKILL.md
1.8 KB

Voice Recognition (Whisper)

Local speech-to-text with OpenAI Whisper CLI.

Features

  • Local processing - No API key needed, free
  • Multi-language - Chinese, English, 100+ languages
  • Translation - Translate to English
  • Summarization - Generate quick summary

Usage

Basic

# Chinese recognition
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a

# Force Chinese
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --zh

# English recognition  
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --en

# Translate to English
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --translate

# With summary
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --summarize

Quick Command (add to ~/.zshrc)

alias voice="python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py"

Then use:

voice ~/Downloads/audio.m4a --zh

Requirements

  • OpenAI Whisper CLI: brew install openai-whisper
  • Python 3.10+

Files

  • scripts/voice识别_升级版.py - Main script
  • scripts/voice_tool_README.md - Documentation

Supported Formats

  • MP3, M4A, WAV, OGG, FLAC, WebM

Language Support

100+ languages including:

  • Chinese (zh)
  • English (en)
  • Japanese (ja)
  • Korean (ko)
  • And more...

Notes

  • Default model: medium (balance of speed and accuracy)
  • First run downloads model to ~/.cache/whisper
  • Processing time varies by audio length and model size

Source

git clone https://clawhub.ai/gykdly/voice-recognitionView on GitHub

Overview

Voice Recognition uses the OpenAI Whisper CLI to convert speech to text locally, with no API key required. It supports 100+ languages, can translate output to English, and can generate a quick summary of the transcription. This makes private, offline transcription workflows fast and privacy-preserving.

How This Skill Works

The tool runs OpenAI Whisper locally, downloading the default medium model to ~/.cache/whisper on first use. You can force language output with --zh or --en, enable translation with --translate, and generate a summary with --summarize. It accepts common audio formats such as MP3, M4A, WAV, OGG, FLAC, and WebM.

When to Use It

  • Transcribe private interviews or meeting recordings offline without sending data to the cloud
  • Transcribe (and optionally translate) multilingual content to English for wider sharing
  • Create concise summaries of long lectures or seminars
  • Process offline voice notes when internet access is restricted
  • Build a privacy-preserving, local transcription workflow for teams

Quick Start

  1. Step 1: Install Whisper CLI and Python 3.10+ (e.g., brew install openai-whisper; ensure Python 3.10+ is available)
  2. Step 2: Transcribe an audio file using the main script, e.g. python3 path/to/scripts/voice识别_升级版.py audio.m4a
  3. Step 3: Add features as needed, e.g. python3 path/to/scripts/voice识别_升级版.py audio.m4a --translate --summarize

Best Practices

  • Use the language flags (--zh for Chinese, --en for English) to improve accuracy when known
  • Start with the default medium model; switch to larger models only if hardware permits and accuracy is insufficient
  • Ensure input audio is in a supported format (MP3, M4A, WAV, OGG, FLAC, WebM)
  • Only enable --translate and --summarize when you need them to save time
  • Note that the first run downloads the Whisper model to ~/.cache/whisper

Example Use Cases

  • Transcribing a Chinese podcast offline for local archiving using the script
  • Transcribing a multilingual interview and translating output to English with --translate
  • Generating a quick summary of a 90-minute lecture with --summarize
  • Transcribing a private Zoom meeting locally to keep data on-device
  • Setting up a private transcription workflow for a research team

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers