video-generation
Scannednpx machina-cli add skill Xiangyu-CAS/Vision-Skills/video-generation --openclawVideo Generation with Gemini (Veo 3.1)
Use this skill when the user asks to generate or extend videos with Gemini using the Python SDK.
Default to veo-3.1-fast-generate-preview, resolution="720p", and duration_seconds=4, unless the user asks otherwise or the task requires different settings (e.g., extension, interpolation, reference images, 1080p/4k).
Workflow
- Identify the task type: text-to-video, image-to-video, reference images, first/last frames (interpolation), or video extension.
- Ensure
GEMINI_API_KEYis available (env or local.env), then use the Python SDK. - When using images, pass
types.Image(imageBytes=..., mimeType=...)(notPIL.Imageortypes.Part) to avoid input type errors. - Call
client.models.generate_videos(...)with the correct inputs/config (see references). - Poll the operation until
done, then download and save the video. - If no videos are returned, surface a clear error and suggest checking the API key, model, and config.
Use these references (by task type)
- Common setup and workflow:
references/overview.md - Parameters and constraints:
references/parameters.md - Model versions and limits:
references/model-versions-and-limitations.md - Prompting guidance:
references/prompt-guide.md
Task types
- Text-to-video:
examples/text-to-video.md - Image-to-video:
examples/image-to-video.md - Reference images:
examples/reference-images.md - First/last frames (interpolation):
examples/first-last-frames.md - Video extension:
examples/video-extension.md
Tuning examples
- Aspect ratio:
examples/aspect-ratio.md - Resolution (4k):
examples/resolution.md - Negative prompt:
examples/negative-prompt.md
Defaults and notes
- Default model:
veo-3.1-fast-generate-preview. - Default output: 720p, 4 seconds.
- For image inputs, always provide
imageBytes+mimeTypeviatypes.Imageto preventINVALID_ARGUMENTerrors. - 1080p/4k, reference images, interpolation, and video extension require
duration_seconds=8. - Video extension is limited to 720p inputs and requires a video from a previous Veo generation.
- Video generation can take minutes; allow longer timeouts when running commands.
Source
git clone https://github.com/Xiangyu-CAS/Vision-Skills/blob/main/skills/video-generation/SKILL.mdView on GitHub Overview
Generates or extends videos using Gemini Veo 3.1 through the Python SDK. It supports text-to-video, image-to-video, reference images, first/last frame interpolation, and video extension, while letting you tune Veo parameters such as aspect ratio, resolution, duration, negative prompts, personGeneration, and seed.
How This Skill Works
Identify the task type (text-to-video, image-to-video, reference images, interpolation, or extension) and ensure GEMINI_API_KEY is available. Initialize the Python SDK client and call client.models.generate_videos with the correct inputs and Veo parameters. For image inputs, pass imageBytes and mimeType via types.Image to avoid input type errors; poll until done, then download and save the video; if no videos are returned surface an error and verify API key, model, and config.
When to Use It
- Text-to-video from a descriptive prompt
- Image-to-video using input images or reference frames
- Reference images to guide scene evolution
- First/last frame interpolation to create smooth transitions
- Video extension to extend or append frames from a previous Veo generation
Quick Start
- Step 1: Ensure GEMINI_API_KEY is set in your environment (e.g., export GEMINI_API_KEY=... )
- Step 2: Choose a task type (text-to-video, image-to-video, etc.) and prepare inputs; for images use types.Image(imageBytes=..., mimeType=...)
- Step 3: Call client.models.generate_videos(...) with inputs and Veo settings, then poll until done and download the video
Best Practices
- Start with defaults: veo-3.1-fast-generate-preview, 720p, duration 4 seconds
- For 1080p/4k, reference images, interpolation, or video extension, set duration_seconds to 8
- When using image inputs, always pass types.Image(imageBytes=..., mimeType=...) to avoid INVALID_ARGUMENT
- Ensure GEMINI_API_KEY is configured and the target model matches your task
- Video generation can take minutes; allow longer timeouts and handle long-running tasks
Example Use Cases
- Create a 720p 4s product teaser from a text prompt
- Convert a batch of product images into a short promotional video
- Animate a scene using reference images with first/last frame interpolation
- Extend an existing Veo video with video extension to add more frames
- Produce a 1080p clip by increasing duration to 8s and enabling higher resolution