Get the FREE Ultimate OpenClaw Setup Guide →
P

Sag

Verified

@steipete

npx machina-cli add skill @steipete/sag --openclaw
Files (1)
SKILL.md
2.0 KB

sag

Use sag for ElevenLabs TTS with local playback.

API key (required)

  • ELEVENLABS_API_KEY (preferred)
  • SAG_API_KEY also supported by the CLI

Quick start

  • sag "Hello there"
  • sag speak -v "Roger" "Hello"
  • sag voices
  • sag prompting (model-specific tips)

Model notes

  • Default: eleven_v3 (expressive)
  • Stable: eleven_multilingual_v2
  • Fast: eleven_flash_v2_5

Pronunciation + delivery rules

  • First fix: respell (e.g. "key-note"), add hyphens, adjust casing.
  • Numbers/units/URLs: --normalize auto (or off if it harms names).
  • Language bias: --lang en|de|fr|... to guide normalization.
  • v3: SSML <break> not supported; use [pause], [short pause], [long pause].
  • v2/v2.5: SSML <break time="1.5s" /> supported; <phoneme> not exposed in sag.

v3 audio tags (put at the entrance of a line)

  • [whispers], [shouts], [sings]
  • [laughs], [starts laughing], [sighs], [exhales]
  • [sarcastic], [curious], [excited], [crying], [mischievously]
  • Example: sag "[whispers] keep this quiet. [short pause] ok?"

Voice defaults

  • ELEVENLABS_VOICE_ID or SAG_VOICE_ID

Confirm voice + speaker before long output.

Chat voice responses

When Peter asks for a "voice" reply (e.g., "crazy scientist voice", "explain in voice"), generate audio and send it:

# Generate audio file
sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"

# Then include in reply:
# MEDIA:/tmp/voice-reply.mp3

Voice character tips:

  • Crazy scientist: Use [excited] tags, dramatic pauses [short pause], vary intensity
  • Calm: Use [whispers] or slower pacing
  • Dramatic: Use [sings] or [shouts] sparingly

Default voice for Clawd: lj2rcrvANS3gaWWnczSX (or just -v Clawd)

Source

git clone https://clawhub.ai/steipete/sagView on GitHub

Overview

sag brings ElevenLabs text-to-speech to the command line with a macOS-like 'say' UX. It enables local playback, voice selection via environment variables (ELEVENLABS_API_KEY or SAG_API_KEY), and quick prompts for model and pronunciation control. Use sag for demos, tutorials, and accessibility-friendly narration.

How This Skill Works

Install sag, provide your API key, then call sag with the text to synthesize. Voices are chosen with -v or SAG_VOICE_ID, and output can be played locally; sag supports model presets (default eleven_v3, stable eleven_multilingual_v2, fast eleven_flash_v2_5) and basic pronunciation rules. For v3, SSML break tags aren’t supported—use [pause]-style tokens; v2/v2.5 support standard SSML <break> tags.

When to Use It

  • Prototype narrated product demos with different ElevenLabs voices
  • Create quick audio prompts for chatbots or assistants with local playback
  • Narrate tutorials or documentation for accessibility or training videos
  • Test multiple voices and models without network-heavy iterations
  • Prepare voice responses for apps that require on-device audio with media embedding

Quick Start

  1. Step 1: Install sag (brew formula: steipete/tap/sag)
  2. Step 2: Export your API key (export ELEVENLABS_API_KEY=your_key or SAG_API_KEY=...)
  3. Step 3: Run a test, e.g. sag "Hello there" or sag speak -v "Roger" "Hello"; optional: sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"

Best Practices

  • Always confirm the chosen voice before committing to long outputs; specify SAG_VOICE_ID or ELEVENLABS_VOICE_ID
  • Normalize tricky text (numbers, URLs) with --normalize auto or adjust as needed
  • Choose the model best suited to your use case (default eleven_v3, stable eleven_multilingual_v2, fast eleven_flash_v2_5)
  • Leverage v3 audio tags like [whispers], [excited], or [short pause] for expressiveness; note v3 does not support SSML breaks
  • Use pronunciation tweaks (respellings, hyphenation, capitalization) to improve delivery

Example Use Cases

  • Narrate a product tour video with a calm or enthusiastic voice using sag
  • Prototype a chat assistant’s voice replies and embed the generated audio in UI
  • Create a tutorial narration with deliberate pacing using [short pause] tokens
  • Produce an accessibility read-aloud of documentation for visually impaired users
  • Generate a podcast intro using a stable ElevenLabs voice for consistency

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers