Walkie-Talkie Mode
@rubenfb23
npx machina-cli add skill @rubenfb23/walkie-talkie --openclawWalkie-Talkie Mode
This skill automates the voice-to-voice loop on WhatsApp using local transcription and local TTS.
Workflow
-
Incoming Audio: When a user sends an audio/ogg/opus file:
- Use
tools/transcribe_voice.shto get the text. - Process the text as a normal user prompt.
- Use
-
Outgoing Response:
- Instead of a text reply, generate speech using
bin/sherpa-onnx-tts. - Send the resulting
.oggfile back to the user as a voice note.
- Instead of a text reply, generate speech using
Triggers
- User sends an audio message.
- User says "activa modo walkie-talkie" or "hablemos por voz".
Constraints
- Use local tools only (ffmpeg, whisper-cpp, sherpa-onnx-tts).
- Maintain a fast response time (RTF < 0.5).
- Always reply with BOTH text (for clarity) and audio.
Manual Execution (Internal)
To respond with voice manually:
bin/sherpa-onnx-tts /tmp/reply.ogg "Tu mensaje aquí"
Then send /tmp/reply.ogg via message tool with filePath.
Overview
Walkie-Talkie Mode automates the voice-to-voice loop on WhatsApp by transcribing incoming audio locally and replying with synthesized speech. It enables users to talk instead of typing, using on-device processing to preserve privacy and deliver quick responses.
How This Skill Works
Incoming audio is transcribed locally with tools/transcribe_voice.sh, converting speech to text. The text is treated as a normal user prompt, then a reply is generated using bin/sherpa-onnx-tts and sent back as an .ogg voice note. The workflow relies entirely on local tools (ffmpeg, whisper-cpp, sherpa-onnx-tts) to keep latency low.
When to Use It
- When you’d rather speak than type in WhatsApp.
- When you receive voice notes and want a quick, natural reply.
- When you need fast latency (RTF < 0.5s) from transcription to reply.
- When privacy matters and you prefer on-device processing (no cloud).
- When you want a hands-free conversation flow and continuous back-and-forth.
Quick Start
- Step 1: Trigger Walkie-Talkie by saying 'activa modo walkie-talkie' or 'hablemos por voz'.
- Step 2: Send an audio message; the system will transcribe it locally and treat it as a normal prompt.
- Step 3: Receive the reply as a generated .ogg voice note (created with sherpa-onnx-tts) and sent back automatically.
Best Practices
- Keep incoming audio messages reasonably brief to improve transcription accuracy.
- Speak clearly and enunciate to aid local transcription (whisper-cpp).
- Use the defined triggers 'activa modo walkie-talkie' or 'hablemos por voz' to start.
- Test with different languages or accents to tune TTS voice cadence.
- Always provide both a text and an audio reply to maintain context and accessibility.
Example Use Cases
- A user sends a short voice memo in WhatsApp; Walkie-Talkie transcribes it, processes the prompt, and returns a natural-sounding voice reply.
- A bilingual user speaks in Spanish; the system transcribes and replies in the same language using local TTS.
- A remote worker uses Walkie-Talkie to draft quick updates while multitasking, receiving immediate audio replies.
- A parent communicates via voice while cooking; the assistant responds with a concise voice note to keep hands free.
- During a workout, the user asks for reminders or directions and gets a rapid audio response back.