Sag
Verified@steipete
npx machina-cli add skill @steipete/sag --openclawsag
Use sag for ElevenLabs TTS with local playback.
API key (required)
ELEVENLABS_API_KEY(preferred)SAG_API_KEYalso supported by the CLI
Quick start
sag "Hello there"sag speak -v "Roger" "Hello"sag voicessag prompting(model-specific tips)
Model notes
- Default:
eleven_v3(expressive) - Stable:
eleven_multilingual_v2 - Fast:
eleven_flash_v2_5
Pronunciation + delivery rules
- First fix: respell (e.g. "key-note"), add hyphens, adjust casing.
- Numbers/units/URLs:
--normalize auto(oroffif it harms names). - Language bias:
--lang en|de|fr|...to guide normalization. - v3: SSML
<break>not supported; use[pause],[short pause],[long pause]. - v2/v2.5: SSML
<break time="1.5s" />supported;<phoneme>not exposed insag.
v3 audio tags (put at the entrance of a line)
[whispers],[shouts],[sings][laughs],[starts laughing],[sighs],[exhales][sarcastic],[curious],[excited],[crying],[mischievously]- Example:
sag "[whispers] keep this quiet. [short pause] ok?"
Voice defaults
ELEVENLABS_VOICE_IDorSAG_VOICE_ID
Confirm voice + speaker before long output.
Chat voice responses
When Peter asks for a "voice" reply (e.g., "crazy scientist voice", "explain in voice"), generate audio and send it:
# Generate audio file
sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"
# Then include in reply:
# MEDIA:/tmp/voice-reply.mp3
Voice character tips:
- Crazy scientist: Use
[excited]tags, dramatic pauses[short pause], vary intensity - Calm: Use
[whispers]or slower pacing - Dramatic: Use
[sings]or[shouts]sparingly
Default voice for Clawd: lj2rcrvANS3gaWWnczSX (or just -v Clawd)
Overview
sag brings ElevenLabs text-to-speech to the command line with a macOS-like 'say' UX. It enables local playback, voice selection via environment variables (ELEVENLABS_API_KEY or SAG_API_KEY), and quick prompts for model and pronunciation control. Use sag for demos, tutorials, and accessibility-friendly narration.
How This Skill Works
Install sag, provide your API key, then call sag with the text to synthesize. Voices are chosen with -v or SAG_VOICE_ID, and output can be played locally; sag supports model presets (default eleven_v3, stable eleven_multilingual_v2, fast eleven_flash_v2_5) and basic pronunciation rules. For v3, SSML break tags aren’t supported—use [pause]-style tokens; v2/v2.5 support standard SSML <break> tags.
When to Use It
- Prototype narrated product demos with different ElevenLabs voices
- Create quick audio prompts for chatbots or assistants with local playback
- Narrate tutorials or documentation for accessibility or training videos
- Test multiple voices and models without network-heavy iterations
- Prepare voice responses for apps that require on-device audio with media embedding
Quick Start
- Step 1: Install sag (brew formula: steipete/tap/sag)
- Step 2: Export your API key (export ELEVENLABS_API_KEY=your_key or SAG_API_KEY=...)
- Step 3: Run a test, e.g. sag "Hello there" or sag speak -v "Roger" "Hello"; optional: sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"
Best Practices
- Always confirm the chosen voice before committing to long outputs; specify SAG_VOICE_ID or ELEVENLABS_VOICE_ID
- Normalize tricky text (numbers, URLs) with --normalize auto or adjust as needed
- Choose the model best suited to your use case (default eleven_v3, stable eleven_multilingual_v2, fast eleven_flash_v2_5)
- Leverage v3 audio tags like [whispers], [excited], or [short pause] for expressiveness; note v3 does not support SSML breaks
- Use pronunciation tweaks (respellings, hyphenation, capitalization) to improve delivery
Example Use Cases
- Narrate a product tour video with a calm or enthusiastic voice using sag
- Prototype a chat assistant’s voice replies and embed the generated audio in UI
- Create a tutorial narration with deliberate pacing using [short pause] tokens
- Produce an accessibility read-aloud of documentation for visually impaired users
- Generate a podcast intro using a stable ElevenLabs voice for consistency