What is ElevenLabs Automation?

A skill that automates ElevenLabs TTS tasks via the Composio MCP integration, including generating speech, browsing voices, checking subscriptions, listing models, streaming audio, and retrieving history.

Which tools does it expose?

ELEVENLABS_TEXT_TO_SPEECH, ELEVENLABS_GET_VOICES, ELEVENLABS_GET_VOICE, ELEVENLABS_GET_USER_SUBSCRIPTION_INFO, ELEVENLABS_GET_MODELS, ELEVENLABS_TEXT_TO_SPEECH_STREAM, and ELEVENLABS_GET_AUDIO_FROM_HISTORY_ITEM.

What should I know about the output URL?

The generated audio is provided as a presigned URL at data.file.s3url that expires in about one hour, so download promptly.

ElevenLabs Automation

Scanned

npx machina-cli add skill ComposioHQ/awesome-claude-skills/elevenlabs-automation --openclaw

Files (1)

SKILL.md

5.1 KB

ElevenLabs Automation

Automate your ElevenLabs text-to-speech workflows -- convert text to natural speech, browse the voice library, inspect voice details, check subscription credits, select TTS models, stream audio for low-latency delivery, and retrieve previously generated audio from history.

Toolkit docs: composio.dev/toolkits/elevenlabs

Setup

Add the Composio MCP server to your client: https://rube.app/mcp
Connect your ElevenLabs account when prompted (API key authentication)
Start using the workflows below

Core Workflows

1. Generate Speech from Text

Use ELEVENLABS_TEXT_TO_SPEECH to convert text into a downloadable audio file.

Tool: ELEVENLABS_TEXT_TO_SPEECH
Inputs:
  - voice_id: string (required) -- obtain from ELEVENLABS_GET_VOICES
  - text: string (required) -- max ~10,000 chars (most models), 30,000 (Flash/Turbo v2), 40,000 (v2.5)
  - model_id: string (default "eleven_monolingual_v1") -- e.g., "eleven_multilingual_v2"
  - output_format: string (default "mp3_44100_128") -- see formats below
  - optimize_streaming_latency: integer (0-4; NOT supported with eleven_v3)
  - seed: integer (optional, for reproducibility -- not guaranteed)
  - pronunciation_dictionary_locators: array (optional, up to 3 dictionaries)

Output formats:

MP3: mp3_22050_32, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192 (Creator+)
PCM: pcm_16000, pcm_22050, pcm_24000, pcm_44100 (Pro+)
uLaw: ulaw_8000 (for Twilio)

Important: Output is a file object with a presigned download link at data.file.s3url (expires in ~1 hour). Download promptly.

2. Browse Available Voices

Use ELEVENLABS_GET_VOICES to list all voices with their attributes and settings.

Tool: ELEVENLABS_GET_VOICES
Inputs: (none)

Returns an array at data.voices[] with voice_id, name, labels (gender, accent, use_case), and settings.

3. Inspect a Specific Voice

Use ELEVENLABS_GET_VOICE to get detailed metadata for a candidate voice before synthesis.

Tool: ELEVENLABS_GET_VOICE
Inputs:
  - voice_id: string (required) -- e.g., "21m00Tcm4TlvDq8ikWAM"
  - with_settings: boolean (default false) -- include detailed voice settings

4. Check Subscription and Credits

Use ELEVENLABS_GET_USER_SUBSCRIPTION_INFO to verify plan limits and remaining credits before bulk generation.

Tool: ELEVENLABS_GET_USER_SUBSCRIPTION_INFO
Inputs: (none)

5. List Available TTS Models

Use ELEVENLABS_GET_MODELS to discover compatible models and filter by can_do_text_to_speech: true.

Tool: ELEVENLABS_GET_MODELS
Inputs: (none)

6. Stream Audio and Retrieve History

Use ELEVENLABS_TEXT_TO_SPEECH_STREAM for low-latency streamed delivery, and ELEVENLABS_GET_AUDIO_FROM_HISTORY_ITEM to re-download previously generated audio.

Tool: ELEVENLABS_TEXT_TO_SPEECH_STREAM
  - Same core inputs as TEXT_TO_SPEECH but returns a stream for low-latency playback

Tool: ELEVENLABS_GET_AUDIO_FROM_HISTORY_ITEM
  - history_item_id: string (required) -- ID from a previous generation

Known Pitfalls

Pitfall	Detail
Text length limits	Most models cap at ~10,000-20,000 chars per request. Oversized input returns HTTP 400. Split long text into chunks (~5000 chars) and generate per chunk.
Output is a presigned URL	`ELEVENLABS_TEXT_TO_SPEECH` returns `data.file.s3url` with a ~1 hour expiry (X-Amz-Expires=3600). Download the audio file promptly.
Quota and credit errors	HTTP 401 with `quota_exceeded` or HTTP 402 `payment_required` means insufficient credits or tier restrictions. Check with `ELEVENLABS_GET_USER_SUBSCRIPTION_INFO` before bulk jobs.
Voice permissions	HTTP 401 with `missing_permissions` means the API key lacks `voices_read` scope. Verify key permissions.
Model compatibility	Not all models support TTS. Use `ELEVENLABS_GET_MODELS` and filter by `can_do_text_to_speech: true`. The `optimize_streaming_latency` parameter is NOT supported with `eleven_v3`.
Large voice list truncation	`ELEVENLABS_GET_VOICES` may return a large list. Select from the full `data.voices[]` payload -- previews may appear truncated.

Quick Reference

Tool Slug	Description
`ELEVENLABS_TEXT_TO_SPEECH`	Convert text to speech, returns downloadable audio file
`ELEVENLABS_GET_VOICES`	List all available voices with attributes
`ELEVENLABS_GET_VOICE`	Get detailed info for a specific voice
`ELEVENLABS_GET_USER_SUBSCRIPTION_INFO`	Check subscription plan and remaining credits
`ELEVENLABS_GET_MODELS`	List available TTS models and capabilities
`ELEVENLABS_TEXT_TO_SPEECH_STREAM`	Stream audio for low-latency delivery
`ELEVENLABS_GET_AUDIO_FROM_HISTORY_ITEM`	Re-download audio from generation history

Powered by Composio

Source

git clone https://github.com/ComposioHQ/awesome-claude-skills/blob/master/composio-skills/elevenlabs-automation/SKILL.mdView on GitHub

Overview

Automate ElevenLabs text-to-speech workflows end-to-end, including generating speech from text, browsing and inspecting voices, and monitoring credits. It also lets you list compatible models, stream audio for low-latency playback, and retrieve previously generated audio from history via the Composio MCP integration.

How This Skill Works

The skill uses MCP-based tools to orchestrate ElevenLabs TTS tasks: fetch voices with ELEVENLABS_GET_VOICES, inspect a voice with ELEVENLABS_GET_VOICE, verify subscription info with ELEVENLABS_GET_USER_SUBSCRIPTION_INFO, and discover models with ELEVENLABS_GET_MODELS. It then generates or streams speech via ELEVENLABS_TEXT_TO_SPEECH or ELEVENLABS_TEXT_TO_SPEECH_STREAM, returning a presigned audio file URL at data.file.s3url (expires ~1 hour).

When to Use It

You need to generate speech from text using a specific voice and model.
You want to browse or inspect voice metadata before synthesis.
You need to verify remaining credits or plan limits before bulk generation.
You want to discover compatible TTS models for your scripts.
You require low-latency streaming or re-download from history when needed.

Quick Start

Step 1: Set up the MCP server and connect your ElevenLabs account (API key authentication) as described in the Setup.
Step 2: Run ELEVENLABS_GET_VOICES to fetch available voices and select a voice_id.
Step 3: Use ELEVENLABS_TEXT_TO_SPEECH with sample text, the chosen voice_id, and a model_id to generate audio; inspect data.file.s3url for download.

Best Practices

Always fetch voices first with ELEVENLABS_GET_VOICES to identify candidate voice_id values.
Use ELEVENLABS_GET_MODELS to filter by can_do_text_to_speech: true before generating.
Chunk long texts (~5000 chars) to stay within per-request limits for most models.
Prefer ELEVENLABS_TEXT_TO_SPEECH_STREAM for low-latency delivery when real-time playback matters.
Download audio promptly due to the presigned URL expiry (~1 hour) at data.file.s3url.

Example Use Cases

Automate daily announcements by generating speech in a chosen voice and storing the audio file for distribution.
Prototype multilingual content by comparing voices across multiple languages using ELEVENLABS_GET_VOICES.
Monitor credits and plan limits before running large batches of voice generations with GET_USER_SUBSCRIPTION_INFO.
Discover the best-fit TTS models for your script library using GET_MODELS and model filtering.
Stream narration live in a kiosk or guide and re-download from history if needed using HISTORY and STREAM tools.

Frequently Asked Questions

Add this skill to your agents