Do I need an API key to use sag?

Yes. Provide ELEVENLABS_API_KEY (preferred) or SAG_API_KEY, then sag can synthesize speech locally.

How do I choose which voice sag uses?

Specify the voice with -v or set SAG_VOICE_ID / ELEVENLABS_VOICE_ID. The default is model-dependent (eleven_v3) unless you override it.

Does sag support SSML?

v3 does not support SSML break tags; use [pause]-style tokens. v2/v2.5 do support SSML breaks like .

Sag

Verified

@steipete

npx machina-cli add skill @steipete/sag --openclaw

Files (1)

SKILL.md

2.0 KB

sag

Use sag for ElevenLabs TTS with local playback.

API key (required)

ELEVENLABS_API_KEY (preferred)
SAG_API_KEY also supported by the CLI

Quick start

sag "Hello there"
sag speak -v "Roger" "Hello"
sag voices
sag prompting (model-specific tips)

Model notes

Default: eleven_v3 (expressive)
Stable: eleven_multilingual_v2
Fast: eleven_flash_v2_5

Pronunciation + delivery rules

First fix: respell (e.g. "key-note"), add hyphens, adjust casing.
Numbers/units/URLs: --normalize auto (or off if it harms names).
Language bias: --lang en|de|fr|... to guide normalization.
v3: SSML <break> not supported; use [pause], [short pause], [long pause].
v2/v2.5: SSML <break time="1.5s" /> supported; <phoneme> not exposed in sag.

v3 audio tags (put at the entrance of a line)

[whispers], [shouts], [sings]
[laughs], [starts laughing], [sighs], [exhales]
[sarcastic], [curious], [excited], [crying], [mischievously]
Example: sag "[whispers] keep this quiet. [short pause] ok?"

Voice defaults

ELEVENLABS_VOICE_ID or SAG_VOICE_ID

Confirm voice + speaker before long output.

Chat voice responses

When Peter asks for a "voice" reply (e.g., "crazy scientist voice", "explain in voice"), generate audio and send it:

# Generate audio file
sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"

# Then include in reply:
# MEDIA:/tmp/voice-reply.mp3

Voice character tips:

Crazy scientist: Use [excited] tags, dramatic pauses [short pause], vary intensity
Calm: Use [whispers] or slower pacing
Dramatic: Use [sings] or [shouts] sparingly

Default voice for Clawd: lj2rcrvANS3gaWWnczSX (or just -v Clawd)

Source

git clone https://clawhub.ai/steipete/sagView on GitHub

Overview

sag brings ElevenLabs text-to-speech to the command line with a macOS-like 'say' UX. It enables local playback, voice selection via environment variables (ELEVENLABS_API_KEY or SAG_API_KEY), and quick prompts for model and pronunciation control. Use sag for demos, tutorials, and accessibility-friendly narration.

How This Skill Works

Install sag, provide your API key, then call sag with the text to synthesize. Voices are chosen with -v or SAG_VOICE_ID, and output can be played locally; sag supports model presets (default eleven_v3, stable eleven_multilingual_v2, fast eleven_flash_v2_5) and basic pronunciation rules. For v3, SSML break tags aren’t supported—use [pause]-style tokens; v2/v2.5 support standard SSML <break> tags.

When to Use It

Prototype narrated product demos with different ElevenLabs voices
Create quick audio prompts for chatbots or assistants with local playback
Narrate tutorials or documentation for accessibility or training videos
Test multiple voices and models without network-heavy iterations
Prepare voice responses for apps that require on-device audio with media embedding

Quick Start

Step 1: Install sag (brew formula: steipete/tap/sag)
Step 2: Export your API key (export ELEVENLABS_API_KEY=your_key or SAG_API_KEY=...)
Step 3: Run a test, e.g. sag "Hello there" or sag speak -v "Roger" "Hello"; optional: sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"

Best Practices

Always confirm the chosen voice before committing to long outputs; specify SAG_VOICE_ID or ELEVENLABS_VOICE_ID
Normalize tricky text (numbers, URLs) with --normalize auto or adjust as needed
Choose the model best suited to your use case (default eleven_v3, stable eleven_multilingual_v2, fast eleven_flash_v2_5)
Leverage v3 audio tags like [whispers], [excited], or [short pause] for expressiveness; note v3 does not support SSML breaks
Use pronunciation tweaks (respellings, hyphenation, capitalization) to improve delivery

Example Use Cases

Narrate a product tour video with a calm or enthusiastic voice using sag
Prototype a chat assistant’s voice replies and embed the generated audio in UI
Create a tutorial narration with deliberate pacing using [short pause] tokens
Produce an accessibility read-aloud of documentation for visually impaired users
Generate a podcast intro using a stable ElevenLabs voice for consistency

Frequently Asked Questions

Add this skill to your agents