What credentials are required to use Zvukogram?

You need a Zvukogram API token and account email, provided via a config file (~/.config/zvukogram/config.json) or environment variables ZVUKOGRAM_TOKEN and ZVUKOGRAM_EMAIL.

Are there API limits or SSML restrictions?

Max 1000 characters per /text request, up to 1M characters via /longtext. SSML is supported for stresses, pauses, and speed, but the tag is not supported by the API.

How do I build longer audio from multiple fragments?

Create individual audio fragments and merge them with ffmpeg to form a continuous track for longer scripts.

Zvukogram

Scanned

@erview

npx machina-cli add skill @erview/zvukogram --openclaw

Files (1)

SKILL.md

3.0 KB

Zvukogram TTS

Speech generation via Zvukogram API with SSML markup support.

Requirements

To use this skill, you need:

Zvukogram API token — get it at https://zvukogram.com/
Zvukogram account email

Setup

Create file ~/.config/zvukogram/config.json:

mkdir -p ~/.config/zvukogram

{
  "token": "your_api_token_here",
  "email": "your_email@example.com"
}

Or use environment variables:

export ZVUKOGRAM_TOKEN=your_api_token_here
export ZVUKOGRAM_EMAIL=your_email@example.com

Quick Start

# Simple TTS
python3 scripts/tts.py --text "Hello, world!" --voice Алена --output hello.mp3

# With +20% speed
python3 scripts/tts.py --text "Fast text" --voice Алена --speed 1.2 --output fast.mp3

# Check balance
python3 scripts/balance.py

Features

TTS generation — text to speech
SSML support — stress marks, pauses, speed
Audio merging — combine fragments via ffmpeg
Transcription — proper pronunciation of English words

SSML Markup

Stress Marks

Use + before stressed vowel:

З+амок — stress on "a"
зам+ок — stress on "o"

Aliases (Transcription)

<sub alias="Оупен Эй Ай">OpenAI</sub>
<sub alias="Самсунг">Samsung</sub>
<sub alias="Ал+ьтман">Альтман</sub>

Speed

<prosody rate="1.2">20% faster</prosody>
<prosody rate="fast">Fast text</prosody>

Pauses

<break time="500ms"/>

Available Voices

Алена — female, neutral (recommended)
Андрей — male, neutral (recommended)
Александра — female, soft
Антон — male, business

Full list: see references/VOICES.md

Examples

See references/EXAMPLES.md for:

Dialogs and podcasts
News voiceover
Voice notifications
Long texts

Transcription

See references/TRANSCRIPTION.md for proper pronunciation:

OpenAI → Оупен Эй Ай
GPT → Джи Пи Ти
Samsung → Самсунг
Altman → Ал+ьтман

SSML Reference

See references/SSML_CHEATSHEET.md for quick tag lookup.

Troubleshooting

See references/TROUBLESHOOTING.md for:

API errors
Audio issues
Diagnostics

API Limitations

Max 1000 characters per request (/text)
Up to 1M characters via /longtext
SSML with <voice> not supported via API (web only)
For multi-voice — merge fragments

Overview

Zvukogram TTS generates speech from text via the Zvukogram API, with SSML markup, speed control, stress marks, and English transcription. It supports audio fragment merging for longer content, making it ideal for podcasts, voice notifications, and other audio projects.

How This Skill Works

Send text and SSML to the Zvukogram API using your token and account email; the API returns synthesized audio. For longer content, merge fragments with ffmpeg. SSML enables stress marks, pauses, and speed adjustments to fine-tune pronunciation and prosody.

When to Use It

Create podcasts or voiceover content from scripts
Send voice notifications or alerts with precise timing
Produce multilingual or accented content requiring transcription and pronunciation control
Assemble long-form audio by merging multiple fragments
Prototype and test different voices and prosody before final production

Quick Start

Step 1: Create ~/.config/zvukogram/config.json with your token and email or export ZVUKOGRAM_TOKEN and ZVUKOGRAM_EMAIL
Step 2: Run a simple TTS: python3 scripts/tts.py --text "Hello, world!" --voice Алена --output hello.mp3
Step 3: Check balance (optional): python3 scripts/balance.py

Best Practices

Use SSML stress marks and pauses to improve naturalness
Leverage transcription aliases for tricky English words
Experiment with <prosody> rate values to match desired pacing
Split long scripts into fragments and merge them with ffmpeg
Validate character limits: <=1000 chars per /text, up to 1M via /longtext

Example Use Cases

Dialogs and podcasts with scripted narration
News-like voiceover segments for quick updates
Timed voice notifications for apps and devices
Long-form audio projects created by merging fragments
Pronunciation tweaks for English terms using transcription aliases

Frequently Asked Questions

Add this skill to your agents

Zvukogram

Zvukogram TTS

Requirements

Setup

Quick Start

Features

SSML Markup

Stress Marks

Aliases (Transcription)

Speed

Pauses

Available Voices

Examples

Transcription

SSML Reference

Troubleshooting

API Limitations

Links

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions