Get the FREE Ultimate OpenClaw Setup Guide →
e

Zvukogram

Scanned

@erview

npx machina-cli add skill @erview/zvukogram --openclaw
Files (1)
SKILL.md
3.0 KB

Zvukogram TTS

Speech generation via Zvukogram API with SSML markup support.

Requirements

To use this skill, you need:

Setup

Create file ~/.config/zvukogram/config.json:

mkdir -p ~/.config/zvukogram
{
  "token": "your_api_token_here",
  "email": "your_email@example.com"
}

Or use environment variables:

export ZVUKOGRAM_TOKEN=your_api_token_here
export ZVUKOGRAM_EMAIL=your_email@example.com

Quick Start

# Simple TTS
python3 scripts/tts.py --text "Hello, world!" --voice Алена --output hello.mp3

# With +20% speed
python3 scripts/tts.py --text "Fast text" --voice Алена --speed 1.2 --output fast.mp3

# Check balance
python3 scripts/balance.py

Features

  • TTS generation — text to speech
  • SSML support — stress marks, pauses, speed
  • Audio merging — combine fragments via ffmpeg
  • Transcription — proper pronunciation of English words

SSML Markup

Stress Marks

Use + before stressed vowel:

З+амок — stress on "a"
зам+ок — stress on "o"

Aliases (Transcription)

<sub alias="Оупен Эй Ай">OpenAI</sub>
<sub alias="Самсунг">Samsung</sub>
<sub alias="Ал+ьтман">Альтман</sub>

Speed

<prosody rate="1.2">20% faster</prosody>
<prosody rate="fast">Fast text</prosody>

Pauses

<break time="500ms"/>

Available Voices

  • Алена — female, neutral (recommended)
  • Андрей — male, neutral (recommended)
  • Александра — female, soft
  • Антон — male, business

Full list: see references/VOICES.md

Examples

See references/EXAMPLES.md for:

  • Dialogs and podcasts
  • News voiceover
  • Voice notifications
  • Long texts

Transcription

See references/TRANSCRIPTION.md for proper pronunciation:

  • OpenAI → Оупен Эй Ай
  • GPT → Джи Пи Ти
  • Samsung → Самсунг
  • Altman → Ал+ьтман

SSML Reference

See references/SSML_CHEATSHEET.md for quick tag lookup.

Troubleshooting

See references/TROUBLESHOOTING.md for:

  • API errors
  • Audio issues
  • Diagnostics

API Limitations

  • Max 1000 characters per request (/text)
  • Up to 1M characters via /longtext
  • SSML with <voice> not supported via API (web only)
  • For multi-voice — merge fragments

Links

Source

git clone https://clawhub.ai/erview/zvukogramView on GitHub

Overview

Zvukogram TTS generates speech from text via the Zvukogram API, with SSML markup, speed control, stress marks, and English transcription. It supports audio fragment merging for longer content, making it ideal for podcasts, voice notifications, and other audio projects.

How This Skill Works

Send text and SSML to the Zvukogram API using your token and account email; the API returns synthesized audio. For longer content, merge fragments with ffmpeg. SSML enables stress marks, pauses, and speed adjustments to fine-tune pronunciation and prosody.

When to Use It

  • Create podcasts or voiceover content from scripts
  • Send voice notifications or alerts with precise timing
  • Produce multilingual or accented content requiring transcription and pronunciation control
  • Assemble long-form audio by merging multiple fragments
  • Prototype and test different voices and prosody before final production

Quick Start

  1. Step 1: Create ~/.config/zvukogram/config.json with your token and email or export ZVUKOGRAM_TOKEN and ZVUKOGRAM_EMAIL
  2. Step 2: Run a simple TTS: python3 scripts/tts.py --text "Hello, world!" --voice Алена --output hello.mp3
  3. Step 3: Check balance (optional): python3 scripts/balance.py

Best Practices

  • Use SSML stress marks and pauses to improve naturalness
  • Leverage transcription aliases for tricky English words
  • Experiment with <prosody> rate values to match desired pacing
  • Split long scripts into fragments and merge them with ffmpeg
  • Validate character limits: <=1000 chars per /text, up to 1M via /longtext

Example Use Cases

  • Dialogs and podcasts with scripted narration
  • News-like voiceover segments for quick updates
  • Timed voice notifications for apps and devices
  • Long-form audio projects created by merging fragments
  • Pronunciation tweaks for English terms using transcription aliases

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers