Get the FREE Ultimate OpenClaw Setup Guide →

ElevenLabs Automation

Scanned
npx machina-cli add skill ComposioHQ/awesome-claude-skills/elevenlabs-automation --openclaw
Files (1)
SKILL.md
5.1 KB

ElevenLabs Automation

Automate your ElevenLabs text-to-speech workflows -- convert text to natural speech, browse the voice library, inspect voice details, check subscription credits, select TTS models, stream audio for low-latency delivery, and retrieve previously generated audio from history.

Toolkit docs: composio.dev/toolkits/elevenlabs


Setup

  1. Add the Composio MCP server to your client: https://rube.app/mcp
  2. Connect your ElevenLabs account when prompted (API key authentication)
  3. Start using the workflows below

Core Workflows

1. Generate Speech from Text

Use ELEVENLABS_TEXT_TO_SPEECH to convert text into a downloadable audio file.

Tool: ELEVENLABS_TEXT_TO_SPEECH
Inputs:
  - voice_id: string (required) -- obtain from ELEVENLABS_GET_VOICES
  - text: string (required) -- max ~10,000 chars (most models), 30,000 (Flash/Turbo v2), 40,000 (v2.5)
  - model_id: string (default "eleven_monolingual_v1") -- e.g., "eleven_multilingual_v2"
  - output_format: string (default "mp3_44100_128") -- see formats below
  - optimize_streaming_latency: integer (0-4; NOT supported with eleven_v3)
  - seed: integer (optional, for reproducibility -- not guaranteed)
  - pronunciation_dictionary_locators: array (optional, up to 3 dictionaries)

Output formats:

  • MP3: mp3_22050_32, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192 (Creator+)
  • PCM: pcm_16000, pcm_22050, pcm_24000, pcm_44100 (Pro+)
  • uLaw: ulaw_8000 (for Twilio)

Important: Output is a file object with a presigned download link at data.file.s3url (expires in ~1 hour). Download promptly.

2. Browse Available Voices

Use ELEVENLABS_GET_VOICES to list all voices with their attributes and settings.

Tool: ELEVENLABS_GET_VOICES
Inputs: (none)

Returns an array at data.voices[] with voice_id, name, labels (gender, accent, use_case), and settings.

3. Inspect a Specific Voice

Use ELEVENLABS_GET_VOICE to get detailed metadata for a candidate voice before synthesis.

Tool: ELEVENLABS_GET_VOICE
Inputs:
  - voice_id: string (required) -- e.g., "21m00Tcm4TlvDq8ikWAM"
  - with_settings: boolean (default false) -- include detailed voice settings

4. Check Subscription and Credits

Use ELEVENLABS_GET_USER_SUBSCRIPTION_INFO to verify plan limits and remaining credits before bulk generation.

Tool: ELEVENLABS_GET_USER_SUBSCRIPTION_INFO
Inputs: (none)

5. List Available TTS Models

Use ELEVENLABS_GET_MODELS to discover compatible models and filter by can_do_text_to_speech: true.

Tool: ELEVENLABS_GET_MODELS
Inputs: (none)

6. Stream Audio and Retrieve History

Use ELEVENLABS_TEXT_TO_SPEECH_STREAM for low-latency streamed delivery, and ELEVENLABS_GET_AUDIO_FROM_HISTORY_ITEM to re-download previously generated audio.

Tool: ELEVENLABS_TEXT_TO_SPEECH_STREAM
  - Same core inputs as TEXT_TO_SPEECH but returns a stream for low-latency playback

Tool: ELEVENLABS_GET_AUDIO_FROM_HISTORY_ITEM
  - history_item_id: string (required) -- ID from a previous generation

Known Pitfalls

PitfallDetail
Text length limitsMost models cap at ~10,000-20,000 chars per request. Oversized input returns HTTP 400. Split long text into chunks (~5000 chars) and generate per chunk.
Output is a presigned URLELEVENLABS_TEXT_TO_SPEECH returns data.file.s3url with a ~1 hour expiry (X-Amz-Expires=3600). Download the audio file promptly.
Quota and credit errorsHTTP 401 with quota_exceeded or HTTP 402 payment_required means insufficient credits or tier restrictions. Check with ELEVENLABS_GET_USER_SUBSCRIPTION_INFO before bulk jobs.
Voice permissionsHTTP 401 with missing_permissions means the API key lacks voices_read scope. Verify key permissions.
Model compatibilityNot all models support TTS. Use ELEVENLABS_GET_MODELS and filter by can_do_text_to_speech: true. The optimize_streaming_latency parameter is NOT supported with eleven_v3.
Large voice list truncationELEVENLABS_GET_VOICES may return a large list. Select from the full data.voices[] payload -- previews may appear truncated.

Quick Reference

Tool SlugDescription
ELEVENLABS_TEXT_TO_SPEECHConvert text to speech, returns downloadable audio file
ELEVENLABS_GET_VOICESList all available voices with attributes
ELEVENLABS_GET_VOICEGet detailed info for a specific voice
ELEVENLABS_GET_USER_SUBSCRIPTION_INFOCheck subscription plan and remaining credits
ELEVENLABS_GET_MODELSList available TTS models and capabilities
ELEVENLABS_TEXT_TO_SPEECH_STREAMStream audio for low-latency delivery
ELEVENLABS_GET_AUDIO_FROM_HISTORY_ITEMRe-download audio from generation history

Powered by Composio

Source

git clone https://github.com/ComposioHQ/awesome-claude-skills/blob/master/composio-skills/elevenlabs-automation/SKILL.mdView on GitHub

Overview

Automate ElevenLabs text-to-speech workflows end-to-end, including generating speech from text, browsing and inspecting voices, and monitoring credits. It also lets you list compatible models, stream audio for low-latency playback, and retrieve previously generated audio from history via the Composio MCP integration.

How This Skill Works

The skill uses MCP-based tools to orchestrate ElevenLabs TTS tasks: fetch voices with ELEVENLABS_GET_VOICES, inspect a voice with ELEVENLABS_GET_VOICE, verify subscription info with ELEVENLABS_GET_USER_SUBSCRIPTION_INFO, and discover models with ELEVENLABS_GET_MODELS. It then generates or streams speech via ELEVENLABS_TEXT_TO_SPEECH or ELEVENLABS_TEXT_TO_SPEECH_STREAM, returning a presigned audio file URL at data.file.s3url (expires ~1 hour).

When to Use It

  • You need to generate speech from text using a specific voice and model.
  • You want to browse or inspect voice metadata before synthesis.
  • You need to verify remaining credits or plan limits before bulk generation.
  • You want to discover compatible TTS models for your scripts.
  • You require low-latency streaming or re-download from history when needed.

Quick Start

  1. Step 1: Set up the MCP server and connect your ElevenLabs account (API key authentication) as described in the Setup.
  2. Step 2: Run ELEVENLABS_GET_VOICES to fetch available voices and select a voice_id.
  3. Step 3: Use ELEVENLABS_TEXT_TO_SPEECH with sample text, the chosen voice_id, and a model_id to generate audio; inspect data.file.s3url for download.

Best Practices

  • Always fetch voices first with ELEVENLABS_GET_VOICES to identify candidate voice_id values.
  • Use ELEVENLABS_GET_MODELS to filter by can_do_text_to_speech: true before generating.
  • Chunk long texts (~5000 chars) to stay within per-request limits for most models.
  • Prefer ELEVENLABS_TEXT_TO_SPEECH_STREAM for low-latency delivery when real-time playback matters.
  • Download audio promptly due to the presigned URL expiry (~1 hour) at data.file.s3url.

Example Use Cases

  • Automate daily announcements by generating speech in a chosen voice and storing the audio file for distribution.
  • Prototype multilingual content by comparing voices across multiple languages using ELEVENLABS_GET_VOICES.
  • Monitor credits and plan limits before running large batches of voice generations with GET_USER_SUBSCRIPTION_INFO.
  • Discover the best-fit TTS models for your script library using GET_MODELS and model filtering.
  • Stream narration live in a kiosk or guide and re-download from history if needed using HISTORY and STREAM tools.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers