Which models are supported for voice design?

Use one of the exact model strings: qwen3-tts-vd-2026-01-26 or qwen3-tts-vd-realtime-2025-12-16.

Where is the generated audio saved by default?

Default output path is output/ai-audio-tts-voice-design/audio/. You can override the base directory with OUTPUT_DIR.

What credentials are required to use the service?

Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials, to authenticate requests.

Alicloud Ai Audio Tts Voice Design

Scanned

@cinience

npx machina-cli add skill @cinience/alicloud-ai-audio-tts-voice-design --openclaw

Files (1)

SKILL.md

1.7 KB

Category: provider

Model Studio Qwen TTS Voice Design

Use voice design models to create controllable synthetic voices from natural language descriptions.

Critical model names

Use one of these exact model strings:

qwen3-tts-vd-2026-01-26
qwen3-tts-vd-realtime-2025-12-16

Prerequisites

Install SDK in a virtual environment:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope

Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials.

Normalized interface (tts.voice_design)

Request

voice_prompt (string, required) target voice description
text (string, required)
stream (bool, optional)

Response

audio_url (string) or streaming PCM chunks
voice_id (string)
request_id (string)

Operational guidance

Write voice prompts with tone, pace, emotion, and timbre constraints.
Build a reusable voice prompt library for product consistency.
Validate generated voice in short utterances before long scripts.

Local helper script

Prepare a normalized request JSON and validate response schema:

.venv/bin/python skills/ai/audio/alicloud-ai-audio-tts-voice-design/scripts/prepare_voice_design_request.py \
  --voice-prompt "A warm female host voice, clear articulation, medium pace" \
  --text "这是音色设计演示"

Output location

Default output: output/ai-audio-tts-voice-design/audio/
Override base dir with OUTPUT_DIR.

References

references/sources.md

Source

git clone https://clawhub.ai/cinience/alicloud-ai-audio-tts-voice-designView on GitHub

Overview

This skill lets you design controllable synthetic voices using Alibaba Cloud Model Studio Qwen TTS VD models. Choose between two VD models to generate branded speech for apps, media, and customer interfaces. The tts.voice_design interface returns an audio URL or PCM stream along with voice_id and request_id for integration.

How This Skill Works

You call the normalized interface tts.voice_design with a voice_prompt describing the target voice and the text to synthesize. Optionally enable streaming to receive PCM chunks. The response provides audio_url (or PCM stream), voice_id, and request_id, and you should validate the result with short utterances and maintain a library of prompts for consistent branding.

When to Use It

Create branded voices for marketing videos, onboarding, or product demos described by natural language prompts.
Prototype multiple voice personas for chatbots, guides, or virtual assistants to compare tone and pace.
Localize voices for multilingual apps while keeping a consistent timbre and cadence.
Iterate quickly with short prompts and test iterations before committing longer scripts.
Ensure repeatable, brand-consistent speech for customer support IVR or help centers.

Quick Start

Step 1: Create and activate a virtual environment and install the Dashscope SDK: python3 -m venv .venv; . .venv/bin/activate; python -m pip install dashscope.
Step 2: Prepare a voice design request with the helper script, e.g. the sample command: .venv/bin/python skills/ai/audio/alicloud-ai-audio-tts-voice-design/scripts/prepare_voice_design_request.py --voice-prompt "A warm female host voice, clear articulation, medium pace" --text "这是音色设计演示"
Step 3: Use the response's audio_url (or PCM stream) and the associated voice_id and request_id to integrate the voice into your app.

Best Practices

Write voice prompts that clearly specify tone, pace, emotion, and timbre constraints.
Build a reusable prompt library to ensure product-wide consistency across scripts.
Validate generated voices using short utterances before deploying longer scripts.
Use the helper script to prepare requests and confirm response schema aligns with expectations.
Verify you are using one of the exact model strings and follow prerequisite setup.

Example Use Cases

Onboarding video narration with a warm, friendly voice that matches brand guidelines.
IVR assistant with calm, clear diction and steady cadence for easy comprehension.
E-learning module narrator with engaging pace and professional timbre.
Product demo voice with enthusiastic energy and consistent pronunciation.
Localized announcements in Chinese or English with a formal, reliable timbre.

Frequently Asked Questions

Add this skill to your agents