g
Voice Recognition
Verified@gykdly
npx machina-cli add skill @gykdly/voice-recognition --openclawFiles (1)
SKILL.md
1.8 KB
Voice Recognition (Whisper)
Local speech-to-text with OpenAI Whisper CLI.
Features
- Local processing - No API key needed, free
- Multi-language - Chinese, English, 100+ languages
- Translation - Translate to English
- Summarization - Generate quick summary
Usage
Basic
# Chinese recognition
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a
# Force Chinese
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --zh
# English recognition
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --en
# Translate to English
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --translate
# With summary
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --summarize
Quick Command (add to ~/.zshrc)
alias voice="python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py"
Then use:
voice ~/Downloads/audio.m4a --zh
Requirements
- OpenAI Whisper CLI:
brew install openai-whisper - Python 3.10+
Files
scripts/voice识别_升级版.py- Main scriptscripts/voice_tool_README.md- Documentation
Supported Formats
- MP3, M4A, WAV, OGG, FLAC, WebM
Language Support
100+ languages including:
- Chinese (zh)
- English (en)
- Japanese (ja)
- Korean (ko)
- And more...
Notes
- Default model:
medium(balance of speed and accuracy) - First run downloads model to
~/.cache/whisper - Processing time varies by audio length and model size
Overview
Voice Recognition uses the OpenAI Whisper CLI to convert speech to text locally, with no API key required. It supports 100+ languages, can translate output to English, and can generate a quick summary of the transcription. This makes private, offline transcription workflows fast and privacy-preserving.
How This Skill Works
The tool runs OpenAI Whisper locally, downloading the default medium model to ~/.cache/whisper on first use. You can force language output with --zh or --en, enable translation with --translate, and generate a summary with --summarize. It accepts common audio formats such as MP3, M4A, WAV, OGG, FLAC, and WebM.
When to Use It
- Transcribe private interviews or meeting recordings offline without sending data to the cloud
- Transcribe (and optionally translate) multilingual content to English for wider sharing
- Create concise summaries of long lectures or seminars
- Process offline voice notes when internet access is restricted
- Build a privacy-preserving, local transcription workflow for teams
Quick Start
- Step 1: Install Whisper CLI and Python 3.10+ (e.g., brew install openai-whisper; ensure Python 3.10+ is available)
- Step 2: Transcribe an audio file using the main script, e.g. python3 path/to/scripts/voice识别_升级版.py audio.m4a
- Step 3: Add features as needed, e.g. python3 path/to/scripts/voice识别_升级版.py audio.m4a --translate --summarize
Best Practices
- Use the language flags (--zh for Chinese, --en for English) to improve accuracy when known
- Start with the default medium model; switch to larger models only if hardware permits and accuracy is insufficient
- Ensure input audio is in a supported format (MP3, M4A, WAV, OGG, FLAC, WebM)
- Only enable --translate and --summarize when you need them to save time
- Note that the first run downloads the Whisper model to ~/.cache/whisper
Example Use Cases
- Transcribing a Chinese podcast offline for local archiving using the script
- Transcribing a multilingual interview and translating output to English with --translate
- Generating a quick summary of a 90-minute lecture with --summarize
- Transcribing a private Zoom meeting locally to keep data on-device
- Setting up a private transcription workflow for a research team
Frequently Asked Questions
Add this skill to your agents