What is the default model and output?

Defaults are veo-3.1-fast-generate-preview, 720p, and 4 seconds.

What input types are supported?

Supports text prompts and image inputs; for images, pass imageBytes + mimeType via types.Image to avoid errors.

What should I do if no videos are returned?

Surface a clear error and verify the API key, model, and configuration.

video-generation

Scanned

npx machina-cli add skill Xiangyu-CAS/Vision-Skills/video-generation --openclaw

Files (1)

SKILL.md

2.5 KB

Video Generation with Gemini (Veo 3.1)

Use this skill when the user asks to generate or extend videos with Gemini using the Python SDK. Default to veo-3.1-fast-generate-preview, resolution="720p", and duration_seconds=4, unless the user asks otherwise or the task requires different settings (e.g., extension, interpolation, reference images, 1080p/4k).

Workflow

Identify the task type: text-to-video, image-to-video, reference images, first/last frames (interpolation), or video extension.
Ensure GEMINI_API_KEY is available (env or local .env), then use the Python SDK.
When using images, pass types.Image(imageBytes=..., mimeType=...) (not PIL.Image or types.Part) to avoid input type errors.
Call client.models.generate_videos(...) with the correct inputs/config (see references).
Poll the operation until done, then download and save the video.
If no videos are returned, surface a clear error and suggest checking the API key, model, and config.

Use these references (by task type)

Common setup and workflow: references/overview.md
Parameters and constraints: references/parameters.md
Model versions and limits: references/model-versions-and-limitations.md
Prompting guidance: references/prompt-guide.md

Task types

Text-to-video: examples/text-to-video.md
Image-to-video: examples/image-to-video.md
Reference images: examples/reference-images.md
First/last frames (interpolation): examples/first-last-frames.md
Video extension: examples/video-extension.md

Tuning examples

Aspect ratio: examples/aspect-ratio.md
Resolution (4k): examples/resolution.md
Negative prompt: examples/negative-prompt.md

Defaults and notes

Default model: veo-3.1-fast-generate-preview.
Default output: 720p, 4 seconds.
For image inputs, always provide imageBytes + mimeType via types.Image to prevent INVALID_ARGUMENT errors.
1080p/4k, reference images, interpolation, and video extension require duration_seconds=8.
Video extension is limited to 720p inputs and requires a video from a previous Veo generation.
Video generation can take minutes; allow longer timeouts when running commands.

Source

git clone https://github.com/Xiangyu-CAS/Vision-Skills/blob/main/skills/video-generation/SKILL.mdView on GitHub

Overview

Generates or extends videos using Gemini Veo 3.1 through the Python SDK. It supports text-to-video, image-to-video, reference images, first/last frame interpolation, and video extension, while letting you tune Veo parameters such as aspect ratio, resolution, duration, negative prompts, personGeneration, and seed.

How This Skill Works

Identify the task type (text-to-video, image-to-video, reference images, interpolation, or extension) and ensure GEMINI_API_KEY is available. Initialize the Python SDK client and call client.models.generate_videos with the correct inputs and Veo parameters. For image inputs, pass imageBytes and mimeType via types.Image to avoid input type errors; poll until done, then download and save the video; if no videos are returned surface an error and verify API key, model, and config.

When to Use It

Text-to-video from a descriptive prompt
Image-to-video using input images or reference frames
Reference images to guide scene evolution
First/last frame interpolation to create smooth transitions
Video extension to extend or append frames from a previous Veo generation

Quick Start

Step 1: Ensure GEMINI_API_KEY is set in your environment (e.g., export GEMINI_API_KEY=... )
Step 2: Choose a task type (text-to-video, image-to-video, etc.) and prepare inputs; for images use types.Image(imageBytes=..., mimeType=...)
Step 3: Call client.models.generate_videos(...) with inputs and Veo settings, then poll until done and download the video

Best Practices

Start with defaults: veo-3.1-fast-generate-preview, 720p, duration 4 seconds
For 1080p/4k, reference images, interpolation, or video extension, set duration_seconds to 8
When using image inputs, always pass types.Image(imageBytes=..., mimeType=...) to avoid INVALID_ARGUMENT
Ensure GEMINI_API_KEY is configured and the target model matches your task
Video generation can take minutes; allow longer timeouts and handle long-running tasks

Example Use Cases

Create a 720p 4s product teaser from a text prompt
Convert a batch of product images into a short promotional video
Animate a scene using reference images with first/last frame interpolation
Extend an existing Veo video with video extension to add more frames
Produce a 1080p clip by increasing duration to 8s and enabling higher resolution

Frequently Asked Questions

Add this skill to your agents