Get the FREE Ultimate OpenClaw Setup Guide →
S

macOS Local Voice

Scanned

@STRRL

npx machina-cli add skill @STRRL/macos-local-voice --openclaw
Files (1)
SKILL.md
3.5 KB

macOS Local Voice

Fully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.

Requirements

  • macOS (Apple Silicon recommended, Intel works too)
  • yap CLI in PATH — install via brew install finnvoor/tools/yap
  • ffmpeg in PATH (optional, needed for ogg/opus output) — brew install ffmpeg
  • say and osascript are macOS built-in

Speech-to-Text (STT)

Transcribe an audio file to text using Apple's on-device speech recognition.

node {baseDir}/scripts/stt.mjs <audio_file> [locale]
  • audio_file: path to audio (ogg, m4a, mp3, wav, etc.)
  • locale: optional, e.g. zh_CN, en_US, ja_JP. If omitted, uses system default.
  • Outputs transcribed text to stdout.

Supported STT locales

Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.

Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.

Language detection tips

  • If the user's recent messages are in Chinese → use zh_CN
  • If in English → use en_US
  • If mixed or unclear → try without locale (system default)

Text-to-Speech (TTS)

Convert text to an audio file using macOS native TTS.

node {baseDir}/scripts/tts.mjs "<text>" [voice_name] [output_path]
  • text: the text to speak
  • voice_name: optional, e.g. Yue (Premium), Tingting, Ava (Premium). If omitted, auto-selects the best available voice based on text language.
  • output_path: optional, defaults to a timestamped file in ~/.openclaw/media/outbound/
  • Outputs the generated audio file path to stdout.
  • If ffmpeg is available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.

Sending as voice note

After generating the audio file, send it using the message tool:

message action=send media=<path_from_tts.sh> asVoice=true

Voice Management

List available voices, check readiness, or find the best voice for a language:

node {baseDir}/scripts/voices.mjs list [locale]     # List voices, optionally filter by locale
node {baseDir}/scripts/voices.mjs check "<name>"     # Check if a specific voice is downloaded and ready
node {baseDir}/scripts/voices.mjs best <locale>       # Get the highest quality voice for a locale

Quality levels

  • 1 = compact (low quality, always available)
  • 2 = enhanced (mid quality, may need download)
  • 3 = premium (highest quality, needs download from System Settings)

If a voice is not available

Tell the user: "Voice X is not downloaded. Go to System Settings → Accessibility → Spoken Content → System Voice → Manage Voices to download it."

Notes

  • The say command silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always use voices.mjs check before calling tts.mjs with a specific voice name.
  • Premium voices (e.g. Yue (Premium), Ava (Premium)) sound significantly better but must be manually downloaded by the user.
  • Siri voices are not accessible via the speech synthesis API.

Source

git clone https://clawhub.ai/STRRL/macos-local-voiceView on GitHub

Overview

macOS Local Voice provides on-device speech-to-text and text-to-speech using native Apple capabilities. STT uses yap (Apple Speech.framework) for local transcription, while TTS relies on the built-in say, with ffmpeg supporting ogg/opus output when available. No network or API keys are required, and it includes voice quality detection and smart voice selection to pick the best available voice.

How This Skill Works

STT is performed by yap on-device, transcribing audio from a file to text. TTS renders text to speech using the say command and can select a specific downloaded voice; when ffmpeg is present, the output can be ogg/opus, otherwise AIFF. Voice management scripts (voices.mjs) help detect ready voices and choose the best match for a locale.

When to Use It

  • You need private, offline transcription of recordings on macOS.
  • You require no API keys or cloud services due to privacy/compliance concerns.
  • You want locale-aware STT or high-quality TTS voices for a specific language.
  • You need to generate offline voice notes for messaging apps or accessibility tools.
  • You are building an offline macOS automation or assistant with speech capabilities.

Quick Start

  1. Step 1: Install dependencies (yap, ffmpeg if you want ogg/opus) and ensure say and osascript are available on macOS.
  2. Step 2: Transcribe an audio file: node {baseDir}/scripts/stt.mjs <audio_file> [locale].
  3. Step 3: Synthesize speech: node {baseDir}/scripts/tts.mjs "<text>" [voice_name] [output_path].

Best Practices

  • Always run voices.mjs check to confirm a voice is downloaded before using tts.mjs with a specific voice name.
  • Use stt.mjs --locales to identify available locales, then use voices.mjs best <locale> to pick the top voice.
  • If the requested voice isn’t available, fall back to the system default and consider downloading premium voices via System Settings.
  • Ensure dependencies are installed: yap and (optionally) ffmpeg; verify they are in your PATH.
  • Test with representative audio samples in your target languages to validate locale handling and voice quality.

Example Use Cases

  • Transcribe offline meeting recordings for later review without sending data to cloud services.
  • Read long documents aloud to aid accessibility or multitasking.
  • Create offline voice notes for sharing via messaging apps like chat or email.
  • Develop a private macOS assistant that responds with high-quality, locale-appropriate speech.
  • Support language learning by combining locale-specific STT with natural-sounding TTS.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers