Get the FREE Ultimate OpenClaw Setup Guide →

add-voice-transcription

npx machina-cli add skill qwibitai/nanoclaw/add-voice-transcription --openclaw
Files (1)
SKILL.md
4.2 KB

Add Voice Transcription

This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].

Phase 1: Pre-flight

Check if already applied

Read .nanoclaw/state.yaml. If voice-transcription is in applied_skills, skip to Phase 3 (Configure). The code changes are already in place.

Ask the user

Use AskUserQuestion to collect information:

AskUserQuestion: Do you have an OpenAI API key for Whisper transcription?

If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys.

Phase 2: Apply Code Changes

Run the skills engine to apply this skill's code package.

Initialize skills system (if needed)

If .nanoclaw/ directory doesn't exist yet:

npx tsx scripts/apply-skill.ts --init

Apply the skill

npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription

This deterministically:

  • Adds src/transcription.ts (voice transcription module using OpenAI Whisper)
  • Three-way merges voice handling into src/channels/whatsapp.ts (isVoiceMessage check, transcribeAudioMessage call)
  • Three-way merges transcription tests into src/channels/whatsapp.test.ts (mock + 3 test cases)
  • Installs the openai npm dependency
  • Updates .env.example with OPENAI_API_KEY
  • Records the application in .nanoclaw/state.yaml

If the apply reports merge conflicts, read the intent files:

  • modify/src/channels/whatsapp.ts.intent.md — what changed and invariants for whatsapp.ts
  • modify/src/channels/whatsapp.test.ts.intent.md — what changed for whatsapp.test.ts

Validate code changes

npm test
npm run build

All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.

Phase 3: Configure

Get OpenAI API key (if needed)

If the user doesn't have an API key:

I need you to create an OpenAI API key:

  1. Go to https://platform.openai.com/api-keys
  2. Click "Create new secret key"
  3. Give it a name (e.g., "NanoClaw Transcription")
  4. Copy the key (starts with sk-)

Cost: $0.006 per minute of audio ($0.003 per typical 30-second voice note)

Wait for the user to provide the key.

Add to environment

Add to .env:

OPENAI_API_KEY=<their-key>

Sync to container environment:

mkdir -p data/env && cp .env data/env/env

The container reads environment from data/env/env, not .env directly.

Build and restart

npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw  # macOS
# Linux: systemctl --user restart nanoclaw

Phase 4: Verify

Test with a voice note

Tell the user:

Send a voice note in any registered WhatsApp chat. The agent should receive it as [Voice: <transcript>] and respond to its content.

Check logs if needed

tail -f logs/nanoclaw.log | grep -i voice

Look for:

  • Transcribed voice message — successful transcription with character count
  • OPENAI_API_KEY not set — key missing from .env
  • OpenAI transcription failed — API error (check key validity, billing)
  • Failed to download audio message — media download issue

Troubleshooting

Voice notes show "[Voice Message - transcription unavailable]"

  1. Check OPENAI_API_KEY is set in .env AND synced to data/env/env
  2. Verify key works: curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200
  3. Check OpenAI billing — Whisper requires a funded account

Voice notes show "[Voice Message - transcription failed]"

Check logs for the specific error. Common causes:

Agent doesn't respond to voice notes

Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.

Source

git clone https://github.com/qwibitai/nanoclaw/blob/main/.claude/skills/add-voice-transcription/SKILL.mdView on GitHub

Overview

Adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI Whisper. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].

How This Skill Works

The skill introduces a transcription module powered by OpenAI Whisper and integrates it into the WhatsApp channel. It detects voice messages, runs transcription, and exposes the transcript to the agent as a formatted [Voice: <transcript>]; code changes include updates to src/channels/whatsapp.ts and tests, plus an OpenAI dependency and environment key.

When to Use It

  • You need readable transcripts of WhatsApp voice notes to craft timely responses.
  • You have an OpenAI Whisper API key and want automated transcription enabled.
  • You are configuring or updating NanoClaw with this skill and need guidance on application steps.
  • You want to validate transcription with tests and ensure the WhatsApp integration handles voice messages.
  • You need troubleshooting guidance for missing transcripts or API key issues.

Quick Start

  1. Step 1: If needed, initialize the skills system: npx tsx scripts/apply-skill.ts --init
  2. Step 2: Apply the voice transcription skill: npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription
  3. Step 3: Add OPENAI_API_KEY to .env and sync: OPENAI_API_KEY=<your-key>; mkdir -p data/env && cp .env data/env/env; then run npm run build and restart NanoClaw

Best Practices

  • Check .nanoclaw/state.yaml before applying to avoid re-applying a already-present voice-transcription skill.
  • Ensure the agent sees transcripts formatted as [Voice: <transcript>] for quick readability.
  • Follow the three-way merge guidance in modify/...intent.md when conflicts arise during whatsapp.ts and whatsapp.test.ts updates.
  • Install and pin the OpenAI dependency and store OPENAI_API_KEY in .env, syncing to data/env/env.
  • Run npm test and npm run build after applying to confirm all tests (including new voice tests) pass.

Example Use Cases

  • A client sends a 30-second voice note; the agent receives the [Voice: ...] transcript and drafts a reply without needing to listen to audio.
  • An agent uses the transcript to respond accurately when the original voice note was in a noisy environment.
  • An OpenAI API key is added to the environment and validated, enabling Whisper transcription.
  • During a skill update, the developer merges changes into src/channels/whatsapp.ts to handle isVoiceMessage and transcribeAudioMessage.
  • Tests for the new transcription feature mock audio input and cover three scenarios to ensure reliability.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers