Does this require an internet connection?

No. It runs locally using the Whisper CLI, so no API access is required once the model is downloaded.

How many languages are supported?

100+ languages are supported, including Chinese (zh) and English (en).

How do I enable translation and summarization?

Use the flags --translate to translate output to English and --summarize to generate a quick summary of the transcription.

Voice Recognition

Verified

@gykdly

npx machina-cli add skill @gykdly/voice-recognition --openclaw

Files (1)

SKILL.md

1.8 KB

Voice Recognition (Whisper)

Local speech-to-text with OpenAI Whisper CLI.

Features

Local processing - No API key needed, free
Multi-language - Chinese, English, 100+ languages
Translation - Translate to English
Summarization - Generate quick summary

Usage

Basic

# Chinese recognition
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a

# Force Chinese
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --zh

# English recognition  
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --en

# Translate to English
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --translate

# With summary
python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py audio.m4a --summarize

Quick Command (add to ~/.zshrc)

alias voice="python3 /Users/liyi/.openclaw/workspace/scripts/voice识别_升级版.py"

Then use:

voice ~/Downloads/audio.m4a --zh

Requirements

OpenAI Whisper CLI: brew install openai-whisper
Python 3.10+

Files

scripts/voice识别_升级版.py - Main script
scripts/voice_tool_README.md - Documentation

Supported Formats

MP3, M4A, WAV, OGG, FLAC, WebM

Language Support

100+ languages including:

Chinese (zh)
English (en)
Japanese (ja)
Korean (ko)
And more...

Notes

Default model: medium (balance of speed and accuracy)
First run downloads model to ~/.cache/whisper
Processing time varies by audio length and model size

Source

git clone https://clawhub.ai/gykdly/voice-recognitionView on GitHub

Overview

Voice Recognition uses the OpenAI Whisper CLI to convert speech to text locally, with no API key required. It supports 100+ languages, can translate output to English, and can generate a quick summary of the transcription. This makes private, offline transcription workflows fast and privacy-preserving.

How This Skill Works

The tool runs OpenAI Whisper locally, downloading the default medium model to ~/.cache/whisper on first use. You can force language output with --zh or --en, enable translation with --translate, and generate a summary with --summarize. It accepts common audio formats such as MP3, M4A, WAV, OGG, FLAC, and WebM.

When to Use It

Transcribe private interviews or meeting recordings offline without sending data to the cloud
Transcribe (and optionally translate) multilingual content to English for wider sharing
Create concise summaries of long lectures or seminars
Process offline voice notes when internet access is restricted
Build a privacy-preserving, local transcription workflow for teams

Quick Start

Step 1: Install Whisper CLI and Python 3.10+ (e.g., brew install openai-whisper; ensure Python 3.10+ is available)
Step 2: Transcribe an audio file using the main script, e.g. python3 path/to/scripts/voice识别_升级版.py audio.m4a
Step 3: Add features as needed, e.g. python3 path/to/scripts/voice识别_升级版.py audio.m4a --translate --summarize

Best Practices

Use the language flags (--zh for Chinese, --en for English) to improve accuracy when known
Start with the default medium model; switch to larger models only if hardware permits and accuracy is insufficient
Ensure input audio is in a supported format (MP3, M4A, WAV, OGG, FLAC, WebM)
Only enable --translate and --summarize when you need them to save time
Note that the first run downloads the Whisper model to ~/.cache/whisper

Example Use Cases

Transcribing a Chinese podcast offline for local archiving using the script
Transcribing a multilingual interview and translating output to English with --translate
Generating a quick summary of a 90-minute lecture with --summarize
Transcribing a private Zoom meeting locally to keep data on-device
Setting up a private transcription workflow for a research team

Frequently Asked Questions

Add this skill to your agents