Get the FREE Ultimate OpenClaw Setup Guide →
r

Walkie-Talkie Mode

@rubenfb23

npx machina-cli add skill @rubenfb23/walkie-talkie --openclaw
Files (1)
SKILL.md
1.1 KB

Walkie-Talkie Mode

This skill automates the voice-to-voice loop on WhatsApp using local transcription and local TTS.

Workflow

  1. Incoming Audio: When a user sends an audio/ogg/opus file:

    • Use tools/transcribe_voice.sh to get the text.
    • Process the text as a normal user prompt.
  2. Outgoing Response:

    • Instead of a text reply, generate speech using bin/sherpa-onnx-tts.
    • Send the resulting .ogg file back to the user as a voice note.

Triggers

  • User sends an audio message.
  • User says "activa modo walkie-talkie" or "hablemos por voz".

Constraints

  • Use local tools only (ffmpeg, whisper-cpp, sherpa-onnx-tts).
  • Maintain a fast response time (RTF < 0.5).
  • Always reply with BOTH text (for clarity) and audio.

Manual Execution (Internal)

To respond with voice manually:

bin/sherpa-onnx-tts /tmp/reply.ogg "Tu mensaje aquí"

Then send /tmp/reply.ogg via message tool with filePath.

Source

git clone https://clawhub.ai/rubenfb23/walkie-talkieView on GitHub

Overview

Walkie-Talkie Mode automates the voice-to-voice loop on WhatsApp by transcribing incoming audio locally and replying with synthesized speech. It enables users to talk instead of typing, using on-device processing to preserve privacy and deliver quick responses.

How This Skill Works

Incoming audio is transcribed locally with tools/transcribe_voice.sh, converting speech to text. The text is treated as a normal user prompt, then a reply is generated using bin/sherpa-onnx-tts and sent back as an .ogg voice note. The workflow relies entirely on local tools (ffmpeg, whisper-cpp, sherpa-onnx-tts) to keep latency low.

When to Use It

  • When you’d rather speak than type in WhatsApp.
  • When you receive voice notes and want a quick, natural reply.
  • When you need fast latency (RTF < 0.5s) from transcription to reply.
  • When privacy matters and you prefer on-device processing (no cloud).
  • When you want a hands-free conversation flow and continuous back-and-forth.

Quick Start

  1. Step 1: Trigger Walkie-Talkie by saying 'activa modo walkie-talkie' or 'hablemos por voz'.
  2. Step 2: Send an audio message; the system will transcribe it locally and treat it as a normal prompt.
  3. Step 3: Receive the reply as a generated .ogg voice note (created with sherpa-onnx-tts) and sent back automatically.

Best Practices

  • Keep incoming audio messages reasonably brief to improve transcription accuracy.
  • Speak clearly and enunciate to aid local transcription (whisper-cpp).
  • Use the defined triggers 'activa modo walkie-talkie' or 'hablemos por voz' to start.
  • Test with different languages or accents to tune TTS voice cadence.
  • Always provide both a text and an audio reply to maintain context and accessibility.

Example Use Cases

  • A user sends a short voice memo in WhatsApp; Walkie-Talkie transcribes it, processes the prompt, and returns a natural-sounding voice reply.
  • A bilingual user speaks in Spanish; the system transcribes and replies in the same language using local TTS.
  • A remote worker uses Walkie-Talkie to draft quick updates while multitasking, receiving immediate audio replies.
  • A parent communicates via voice while cooking; the assistant responds with a concise voice note to keep hands free.
  • During a workout, the user asks for reminders or directions and gets a rapid audio response back.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers