Alicloud Ai Audio Tts Voice Design
Scanned@cinience
npx machina-cli add skill @cinience/alicloud-ai-audio-tts-voice-design --openclawCategory: provider
Model Studio Qwen TTS Voice Design
Use voice design models to create controllable synthetic voices from natural language descriptions.
Critical model names
Use one of these exact model strings:
qwen3-tts-vd-2026-01-26qwen3-tts-vd-realtime-2025-12-16
Prerequisites
- Install SDK in a virtual environment:
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
- Set
DASHSCOPE_API_KEYin your environment, or adddashscope_api_keyto~/.alibabacloud/credentials.
Normalized interface (tts.voice_design)
Request
voice_prompt(string, required) target voice descriptiontext(string, required)stream(bool, optional)
Response
audio_url(string) or streaming PCM chunksvoice_id(string)request_id(string)
Operational guidance
- Write voice prompts with tone, pace, emotion, and timbre constraints.
- Build a reusable voice prompt library for product consistency.
- Validate generated voice in short utterances before long scripts.
Local helper script
Prepare a normalized request JSON and validate response schema:
.venv/bin/python skills/ai/audio/alicloud-ai-audio-tts-voice-design/scripts/prepare_voice_design_request.py \
--voice-prompt "A warm female host voice, clear articulation, medium pace" \
--text "这是音色设计演示"
Output location
- Default output:
output/ai-audio-tts-voice-design/audio/ - Override base dir with
OUTPUT_DIR.
References
references/sources.md
Overview
This skill lets you design controllable synthetic voices using Alibaba Cloud Model Studio Qwen TTS VD models. Choose between two VD models to generate branded speech for apps, media, and customer interfaces. The tts.voice_design interface returns an audio URL or PCM stream along with voice_id and request_id for integration.
How This Skill Works
You call the normalized interface tts.voice_design with a voice_prompt describing the target voice and the text to synthesize. Optionally enable streaming to receive PCM chunks. The response provides audio_url (or PCM stream), voice_id, and request_id, and you should validate the result with short utterances and maintain a library of prompts for consistent branding.
When to Use It
- Create branded voices for marketing videos, onboarding, or product demos described by natural language prompts.
- Prototype multiple voice personas for chatbots, guides, or virtual assistants to compare tone and pace.
- Localize voices for multilingual apps while keeping a consistent timbre and cadence.
- Iterate quickly with short prompts and test iterations before committing longer scripts.
- Ensure repeatable, brand-consistent speech for customer support IVR or help centers.
Quick Start
- Step 1: Create and activate a virtual environment and install the Dashscope SDK: python3 -m venv .venv; . .venv/bin/activate; python -m pip install dashscope.
- Step 2: Prepare a voice design request with the helper script, e.g. the sample command: .venv/bin/python skills/ai/audio/alicloud-ai-audio-tts-voice-design/scripts/prepare_voice_design_request.py --voice-prompt "A warm female host voice, clear articulation, medium pace" --text "这是音色设计演示"
- Step 3: Use the response's audio_url (or PCM stream) and the associated voice_id and request_id to integrate the voice into your app.
Best Practices
- Write voice prompts that clearly specify tone, pace, emotion, and timbre constraints.
- Build a reusable prompt library to ensure product-wide consistency across scripts.
- Validate generated voices using short utterances before deploying longer scripts.
- Use the helper script to prepare requests and confirm response schema aligns with expectations.
- Verify you are using one of the exact model strings and follow prerequisite setup.
Example Use Cases
- Onboarding video narration with a warm, friendly voice that matches brand guidelines.
- IVR assistant with calm, clear diction and steady cadence for easy comprehension.
- E-learning module narrator with engaging pace and professional timbre.
- Product demo voice with enthusiastic energy and consistent pronunciation.
- Localized announcements in Chinese or English with a formal, reliable timbre.