What credentials do I need?

You need ALIYUN_APP_KEY, ALIYUN_ACCESS_KEY_ID, and ALIYUN_ACCESS_KEY_SECRET, stored as environment variables or configured in the CLI/manual config.

Which voices are available?

Common voices include siyue, xiaoxuan, and xiaoyun; see Alibaba Cloud docs for the full list.

How do I control the output format and voice?

Use -f to set the audio format (eg mp3) and -v to choose the voice; -r sets the sample rate. Defaults are mp3, siyue, and 16000 Hz respectively.

Aliyun TTS

Scanned

@guang384

npx machina-cli add skill @guang384/aliyun-tts --openclaw

Files (1)

SKILL.md

2.0 KB

aliyun-tts

Alibaba Cloud Text-to-Speech synthesis service.

Configuration

Set the following environment variables:

ALIYUN_APP_KEY - Application Key
ALIYUN_ACCESS_KEY_ID - Access Key ID
ALIYUN_ACCESS_KEY_SECRET - Access Key Secret (sensitive)

Option 1: CLI configuration (recommended)

# Configure App Key
clawdbot skills config aliyun-tts ALIYUN_APP_KEY "your-app-key"

# Configure Access Key ID
clawdbot skills config aliyun-tts ALIYUN_ACCESS_KEY_ID "your-access-key-id"

# Configure Access Key Secret (sensitive)
clawdbot skills config aliyun-tts ALIYUN_ACCESS_KEY_SECRET "your-access-key-secret"

Option 2: Manual configuration

Edit ~/.clawdbot/clawdbot.json:

{
  skills: {
    entries: {
      "aliyun-tts": {
        env: {
          ALIYUN_APP_KEY: "your-app-key",
          ALIYUN_ACCESS_KEY_ID: "your-access-key-id",
          ALIYUN_ACCESS_KEY_SECRET: "your-access-key-secret"
        }
      }
    }
  }
}

Usage

# Basic usage
{baseDir}/bin/aliyun-tts "Hello, this is Aliyun TTS"

# Specify output file
{baseDir}/bin/aliyun-tts -o /tmp/voice.mp3 "Hello"

# Specify voice
{baseDir}/bin/aliyun-tts -v siyue "Use siyue voice"

# Specify format and sample rate
{baseDir}/bin/aliyun-tts -f mp3 -r 16000 "Audio parameters"

Options

Flag	Description	Default
`-o, --output`	Output file path	tts.mp3
`-v, --voice`	Voice name	siyue
`-f, --format`	Audio format	mp3
`-r, --sample-rate`	Sample rate	16000

Available Voices

Common voices: siyue, xiaoxuan, xiaoyun, etc. See Alibaba Cloud documentation for the full list.

Chat Voice Replies

When a user requests a voice reply:

# Generate audio
{baseDir}/bin/aliyun-tts -o /tmp/voice-reply.mp3 "Your reply content"

# Include in your response:
# MEDIA:/tmp/voice-reply.mp3

Source

git clone https://clawhub.ai/guang384/aliyun-ttsView on GitHub

Overview

Aliyun TTS is Alibaba Cloud's Text-to-Speech service that converts text into natural-sounding audio. It supports multiple voices such as siyue, xiaoxuan, and xiaoyun, and outputs in common formats with configurable sample rates. This skill wraps the CLI usage and environment-based configuration to help you generate audio programmatically.

How This Skill Works

Credentials are provided via environment variables ALIYUN_APP_KEY, ALIYUN_ACCESS_KEY_ID, and ALIYUN_ACCESS_KEY_SECRET. You can configure them with the recommended CLI method (clawdbot skills config aliyun-tts ...) or manually in the ~/.clawdbot/clawdbot.json file. Once configured, use the aliyun-tts CLI to generate speech by specifying the voice, format, and sample rate, for example: {baseDir}/bin/aliyun-tts -o /tmp/voice.mp3 -v siyue -f mp3 -r 16000 text to synthesize.

When to Use It

Generate audio responses for a chat bot or customer support assistant.
Create IVR prompts or automated phone system greetings.
Provide voice onboarding or tutorial narration in a mobile app.
Add accessibility features by reading articles or content aloud.
Produce media audio for chat voice replies and embed with MEDIA:/path

Quick Start

Step 1: Configure credentials using the recommended CLI or manual config file.
Step 2: Generate audio with the CLI by specifying output, voice, format, and sample rate, plus the text to synthesize.
Step 3: Use the resulting audio file in your app or response (for example via MEDIA:/path).

Best Practices

Store and protect ALIYUN_APP_KEY, ALIYUN_ACCESS_KEY_ID, and ALIYUN_ACCESS_KEY_SECRET securely; avoid logging them.
Test multiple voices (for example siyue, xiaoxuan, xiaoyun) to pick the most natural fit.
Specify a suitable format and a common sample rate (mp3 and 16000 Hz by default).
Always set a dedicated output path with -o to manage generated files cleanly.
Cache or reuse generated audio where possible to reduce repeated TTS requests and costs.

Example Use Cases

Generate a chat response audio file at /tmp/chat-reply.mp3 using the siyue voice.
Create an IVR greeting at /tmp/ivr-hello.mp3 with the siyue voice for a phone system.
Produce onboarding narration for a mobile app in mp3 format with a 16000 Hz sample rate.
Enable accessibility by reading a long article aloud and saving as /tmp/article.mp3.
Prepare a chat voice reply and reference it with MEDIA:/tmp/voice-reply.mp3 in the response.

Frequently Asked Questions

Add this skill to your agents