Which languages are supported?

English and Hindi with automatic language detection; extendable by updating language detection rules.

What is the typical latency for processing a message?

Approximately 5-10 seconds per message after the first model load.

How do I add a custom intent?

Edit the INTENTS map in the voice processor and implement a corresponding handler function.

whatsappVoiceOpenSkill

Scanned

@syedateebulislam

npx machina-cli add skill @syedateebulislam/whatsapp-voice-chat-integration-open-source --openclaw

Files (1)

SKILL.md

6.3 KB

WhatsApp Voice Talk

Turn WhatsApp voice messages into real-time conversations. This skill provides a complete pipeline: voice → transcription → intent detection → response generation → text-to-speech.

Perfect for:

Voice assistants on WhatsApp
Hands-free command interfaces
Multi-lingual chatbots
IoT voice control (drones, smart home, etc.)

Quick Start

1. Install Dependencies

pip install openai-whisper soundfile numpy

2. Process a Voice Message

const { processVoiceNote } = require('./scripts/voice-processor');
const fs = require('fs');

// Read a voice message (OGG, WAV, MP3, etc.)
const buffer = fs.readFileSync('voice-message.ogg');

// Process it
const result = await processVoiceNote(buffer);

console.log(result);
// {
//   status: 'success',
//   response: "Current weather in Delhi is 19°C, haze. Humidity is 56%.",
//   transcript: "What's the weather today?",
//   intent: 'weather',
//   language: 'en',
//   timestamp: 1769860205186
// }

3. Run Auto-Listener

For automatic processing of incoming WhatsApp voice messages:

node scripts/voice-listener-daemon.js

This watches ~/.clawdbot/media/inbound/ every 5 seconds and processes new voice files.

How It Works

Incoming Voice Message
        ↓
    Transcribe (Whisper API)
        ↓
  "What's the weather?"
        ↓
  Detect Language & Intent
        ↓
   Match against INTENTS
        ↓
   Execute Handler
        ↓
   Generate Response
        ↓
   Convert to TTS
        ↓
  Send back via WhatsApp

Key Features

✅ Zero Setup Complexity - No FFmpeg, no complex dependencies. Uses soundfile + Whisper.

✅ Multi-Language - Automatic English/Hindi detection. Extend easily.

✅ Intent-Driven - Define custom intents with keywords and handlers.

✅ Real-Time Processing - 5-10 seconds per message (after first model load).

✅ Customizable - Add weather, status, commands, or anything else.

✅ Production Ready - Built from real usage in Clawdbot.

Common Use Cases

Weather Bot

// User says: "What's the weather in Bangalore?"
// Response: "Current weather in Delhi is 19°C..."

// (Built-in intent, just enable it)

Smart Home Control

// User says: "Turn on the lights"
// Handler: Sends signal to smart home API
// Response: "Lights turned on"

Task Manager

// User says: "Add milk to shopping list"
// Handler: Adds to database
// Response: "Added milk to your list"

Status Checker

// User says: "Is the system running?"
// Handler: Checks system status
// Response: "All systems online"

Customization

Add a Custom Intent

Edit voice-processor.js:

Add to INTENTS map:

const INTENTS = {
  'shopping': {
    keywords: ['shopping', 'list', 'buy', 'खरीद'],
    handler: 'handleShopping'
  }
};

Add handler:

const handlers = {
  async handleShopping(language = 'en') {
    return {
      status: 'success',
      response: language === 'en' 
        ? "What would you like to add to your shopping list?"
        : "आप अपनी शॉपिंग लिस्ट में क्या जोड़ना चाहते हैं?"
    };
  }
};

Support More Languages

Update detectLanguage() for your language's Unicode:

const urduChars = /[\u0600-\u06FF]/g; // Add this

Add language code to returns:

return language === 'ur' ? 'Urdu response' : 'English response';

Set language in transcribe.py:

result = model.transcribe(data, language="ur")

Change Transcription Model

In transcribe.py:

model = whisper.load_model("tiny")    # Fastest, 39MB
model = whisper.load_model("base")    # Default, 140MB  
model = whisper.load_model("small")   # Better, 466MB
model = whisper.load_model("medium")  # Good, 1.5GB

Architecture

Scripts:

transcribe.py - Whisper transcription (Python)
voice-processor.js - Core logic (intent parsing, handlers)
voice-listener-daemon.js - Auto-listener watching for new messages

References:

SETUP.md - Installation and configuration
API.md - Detailed function documentation

Integration with Clawdbot

If running as a Clawdbot skill, hook into message events:

// In your Clawdbot handler
const { processVoiceNote } = require('skills/whatsapp-voice-talk/scripts/voice-processor');

message.on('voice', async (audioBuffer) => {
  const result = await processVoiceNote(audioBuffer, message.from);
  
  // Send response back
  await message.reply(result.response);
  
  // Or send as voice (requires TTS)
  await sendVoiceMessage(result.response);
});

Performance

First run: ~30 seconds (downloads Whisper model, ~140MB)
Typical: 5-10 seconds per message
Memory: ~1.5GB (base model)
Languages: English, Hindi (easily extended)

Supported Audio Formats

OGG (Opus), WAV, FLAC, MP3, CAF, AIFF, and more via libsndfile.

WhatsApp uses Opus-coded OGG by default — works out of the box.

Troubleshooting

"No module named 'whisper'"

pip install openai-whisper

"No module named 'soundfile'"

pip install soundfile

Voice messages not processing?

Check: clawdbot status (is it running?)
Check: ~/.clawdbot/media/inbound/ (files arriving?)
Run daemon manually: node scripts/voice-listener-daemon.js (see logs)

Slow transcription? Use smaller model: whisper.load_model("base") or "tiny"

License

MIT - Use freely, customize, contribute back!

Built for real-world use in Clawdbot. Battle-tested with multiple languages and use cases.

Source

git clone https://clawhub.ai/syedateebulislam/whatsapp-voice-chat-integration-open-sourceView on GitHub

Overview

Transcribe WhatsApp voice notes with Whisper, detect language and intent, run predefined handlers, and respond with synthesized speech. This end-to-end pipeline enables real-time, voice-driven conversations on WhatsApp, with English and Hindi support and customizable intents like weather, status, and commands.

How This Skill Works

Incoming voice messages are transcribed via Whisper, then language and intent are detected to choose a matching handler. The handler returns a text response which is converted to speech with TTS and sent back over WhatsApp, delivering a near real-time experience (roughly 5-10 seconds after the first model load).

When to Use It

Build a hands-free WhatsApp voice interface for weather, status, or command queries.
Create multilingual WhatsApp chatbots that automatically detect English or Hindi.
Enable IoT voice control (smart home, drones, etc.) through WhatsApp messages.
Deploy real-time customer support that handles voice notes from users.
Extend with custom intents and handlers to fit specific business workflows.

Quick Start

Step 1: Install Dependencies - pip install openai-whisper soundfile numpy
Step 2: Process a Voice Message - read a voice file into a buffer and call processVoiceNote(buffer) to obtain transcript, intent, and response
Step 3: Run Auto-Listener - start the daemon to auto-process incoming voice messages

Best Practices

Define clear INTENTS with keywords and corresponding handlers for predictable routing.
Enable automatic language detection and plan to extend support by updating Unicode rules.
Keep responses concise and TTS-friendly to improve clarity and delivery.
Monitor latency and optimize the auto-listener for real-time delivery (target 5-10s post-load).
Test with multiple audio formats (OGG, WAV, MP3) and real user voice samples.

Example Use Cases

Weather Bot: User asks for current weather; system returns a concise forecast using the built-in weather intent.
Smart Home Control: User says 'Turn on the lights'; the handler triggers a smart home API and replies with confirmation.
Task Manager: User says 'Add milk to shopping list'; the handler updates the database and confirms the addition.
Status Checker: User asks 'Is the system running?'; the bot checks health and responds with status.
IoT Voice Control: Users issue commands to control drones or other IoT devices via WhatsApp voice messages.

Frequently Asked Questions

Add this skill to your agents

whatsappVoiceOpenSkill

WhatsApp Voice Talk

Quick Start

1. Install Dependencies

2. Process a Voice Message

3. Run Auto-Listener

How It Works

Key Features

Common Use Cases

Weather Bot

Smart Home Control

Task Manager

Status Checker

Customization

Add a Custom Intent

Support More Languages

Change Transcription Model

Architecture

Integration with Clawdbot

Performance

Supported Audio Formats

Troubleshooting

Further Reading

License

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

Which languages are supported?

What is the typical latency for processing a message?

How do I add a custom intent?