remotion-production
npx machina-cli add skill DojoCodingLabs/remotion-superpowers/remotion-production --openclawRemotion Production Workflow
This skill teaches how to produce complete videos with Remotion by orchestrating multiple MCP tools together. It covers the full pipeline from concept to rendered MP4.
Available MCP Tools
You have access to these MCP servers for media production:
remotion-media (via KIE)
generate_tts— Text-to-speech voiceovers (ElevenLabs TTS)generate_music— Background music (Suno V3.5–V5)generate_sfx— Sound effects (ElevenLabs SFX V2)generate_image— AI images (Nano Banana Pro)generate_video— AI video clips (Veo 3.1)generate_subtitles— Transcribe audio/video to SRT (Whisper)list_assets— List all generated media in the project
TwelveLabs (video understanding)
- Index and analyze video files
- Semantic search within videos ("find the part where...")
- Scene detection, object detection, speaker identification
- Video summarization
Pexels (stock footage)
searchPhotos— Search free stock photossearchVideos— Search free stock videosgetVideo/getPhoto— Get details by IDdownloadVideo— Download video to project
ElevenLabs (optional — advanced voice)
- Voice cloning from audio samples
- Advanced TTS with custom voices
- Audio isolation and processing
- Transcription
Replicate (optional — 100+ AI models)
replicate_run— Run a model synchronously (images)replicate_create_prediction— Start async prediction (video)replicate_get_prediction— Poll prediction status- Image models: FLUX 1.1 Pro, Imagen 4, Ideogram v3, FLUX Kontext
- Video models: Wan 2.5 (T2V, I2V), Kling 2.6 Pro
Production Pipeline
Read individual rule files for detailed workflows:
rules/production-pipeline.md— End-to-end workflow from concept to final renderrules/audio-integration.md— How to integrate generated audio into Remotion compositionsrules/voiceover-sync.md— Syncing TTS voiceovers with animations and captionsrules/music-scoring.md— Generating and timing background musicrules/stock-footage-workflow.md— Searching, downloading, and using stock footage in Remotionrules/video-analysis.md— Using TwelveLabs to analyze and select clips from existing footagerules/captions-workflow.md— TikTok-style animated captions using @remotion/captions and Whisperrules/animation-presets.md— Reusable animation patterns (fade, slide, scale, typewriter, stagger)rules/3d-content.md— Three.js and React Three Fiber via @remotion/threerules/data-visualization.md— Animated charts, dashboards, and number countersrules/visual-effects.md— Light leaks, Lottie, film grain, vignettes, Ken Burnsrules/ci-rendering.md— GitHub Actions workflows for automated video renderingrules/replicate-models.md— Replicate MCP model catalog, usage, and decision guiderules/image-generation.md— AI image prompt engineering, provider selection, Remotion integrationrules/video-generation.md— AI video clip generation, I2V pipeline, sequencing in Remotionrules/sound-effects.md— SFX generation, prompt engineering, timing to visual eventsrules/elevenlabs-advanced.md— Voice cloning, custom TTS parameters, multi-voice scriptsrules/asset-management.md— File organization, naming conventions, staticFile() reference
Key Principles
- Audio drives timing — Generate voiceover first, get its duration, then set composition length to match.
- Assets go in
public/— All generated media files (audio, video, images) must be saved to the project'spublic/directory so Remotion can access them viastaticFile(). - Use Remotion's audio components — Always use
<Audio>component withstaticFile()for audio. Never use HTML<audio>tags. - Frame-based timing — Remotion uses frames, not seconds. Convert with
fps * seconds. At 30fps, 1 second = 30 frames. - Progressive composition — Build the video in layers: visuals first, then voiceover, then music, then SFX.
- Preview frequently — Use
npm run devto preview after each major change. The Remotion player updates live.
Source
git clone https://github.com/DojoCodingLabs/remotion-superpowers/blob/main/skills/remotion-production/SKILL.mdView on GitHub Overview
Orchestrates multiple MCP servers to produce complete Remotion videos from concept to final MP4. It explains how to combine TTS, music, SFX, stock footage, and video analysis into cohesive compositions, guided by a library of production rules.
How This Skill Works
Technically, it uses remotion-media to generate TTS, music, SFX, and stock media; TwelveLabs to index and analyze video; and Pexels/ElevenLabs/Replicate to source stock assets and advanced voice work. The production flow is guided by rule files (production-pipeline.md, audio-integration.md, voiceover-sync.md) that set timing, asset sequencing, and captions. All generated media is placed in the project's public/ directory and accessed with Remotion's staticFile().
When to Use It
- You need voiceovers or TTS narration synced to animations
- You require timed background music and SFX aligned to scene actions
- You want to incorporate stock footage or AI-generated clips into Remotion projects
- You need to analyze existing footage to select clips or summarize scenes using TwelveLabs
- You’re building videos with captions and TikTok-style animations that require synchronized captions
Quick Start
- Step 1: Plan the concept and generate TTS to lock the video duration
- Step 2: Gather media (music, SFX, stock footage) and place all assets in public/
- Step 3: Build the Remotion composition, sync the voiceover with animations, and render the MP4
Best Practices
- Generate the voiceover first to determine duration and set composition length
- Save all outputs to public/ so Remotion can access via staticFile()
- Leverage the rule files (production-pipeline, audio-integration, voiceover-sync, music-scoring) to structure workflows
- Use TwelveLabs to guide edits with scene/object detection and summarization
- Validate renders with CI renders (GitHub Actions) and test locally before publish
Example Use Cases
- Marketing product launch video with TTS narration and synchronized captions
- Tutorial video with background music, SFX cues, and stock footage overlays
- Customer education clip using AI-generated voicework and consistent tone
- Tech explainer leveraging AI video clips and scene detection to pace segments
- TikTok-style short with animated captions and rapid scene changes