Get the FREE Ultimate OpenClaw Setup Guide →

video-generation

Scanned
npx machina-cli add skill Xiangyu-CAS/Vision-Skills/video-generation --openclaw
Files (1)
SKILL.md
2.5 KB

Video Generation with Gemini (Veo 3.1)

Use this skill when the user asks to generate or extend videos with Gemini using the Python SDK. Default to veo-3.1-fast-generate-preview, resolution="720p", and duration_seconds=4, unless the user asks otherwise or the task requires different settings (e.g., extension, interpolation, reference images, 1080p/4k).

Workflow

  1. Identify the task type: text-to-video, image-to-video, reference images, first/last frames (interpolation), or video extension.
  2. Ensure GEMINI_API_KEY is available (env or local .env), then use the Python SDK.
  3. When using images, pass types.Image(imageBytes=..., mimeType=...) (not PIL.Image or types.Part) to avoid input type errors.
  4. Call client.models.generate_videos(...) with the correct inputs/config (see references).
  5. Poll the operation until done, then download and save the video.
  6. If no videos are returned, surface a clear error and suggest checking the API key, model, and config.

Use these references (by task type)

  • Common setup and workflow: references/overview.md
  • Parameters and constraints: references/parameters.md
  • Model versions and limits: references/model-versions-and-limitations.md
  • Prompting guidance: references/prompt-guide.md

Task types

  • Text-to-video: examples/text-to-video.md
  • Image-to-video: examples/image-to-video.md
  • Reference images: examples/reference-images.md
  • First/last frames (interpolation): examples/first-last-frames.md
  • Video extension: examples/video-extension.md

Tuning examples

  • Aspect ratio: examples/aspect-ratio.md
  • Resolution (4k): examples/resolution.md
  • Negative prompt: examples/negative-prompt.md

Defaults and notes

  • Default model: veo-3.1-fast-generate-preview.
  • Default output: 720p, 4 seconds.
  • For image inputs, always provide imageBytes + mimeType via types.Image to prevent INVALID_ARGUMENT errors.
  • 1080p/4k, reference images, interpolation, and video extension require duration_seconds=8.
  • Video extension is limited to 720p inputs and requires a video from a previous Veo generation.
  • Video generation can take minutes; allow longer timeouts when running commands.

Source

git clone https://github.com/Xiangyu-CAS/Vision-Skills/blob/main/skills/video-generation/SKILL.mdView on GitHub

Overview

Generates or extends videos using Gemini Veo 3.1 through the Python SDK. It supports text-to-video, image-to-video, reference images, first/last frame interpolation, and video extension, while letting you tune Veo parameters such as aspect ratio, resolution, duration, negative prompts, personGeneration, and seed.

How This Skill Works

Identify the task type (text-to-video, image-to-video, reference images, interpolation, or extension) and ensure GEMINI_API_KEY is available. Initialize the Python SDK client and call client.models.generate_videos with the correct inputs and Veo parameters. For image inputs, pass imageBytes and mimeType via types.Image to avoid input type errors; poll until done, then download and save the video; if no videos are returned surface an error and verify API key, model, and config.

When to Use It

  • Text-to-video from a descriptive prompt
  • Image-to-video using input images or reference frames
  • Reference images to guide scene evolution
  • First/last frame interpolation to create smooth transitions
  • Video extension to extend or append frames from a previous Veo generation

Quick Start

  1. Step 1: Ensure GEMINI_API_KEY is set in your environment (e.g., export GEMINI_API_KEY=... )
  2. Step 2: Choose a task type (text-to-video, image-to-video, etc.) and prepare inputs; for images use types.Image(imageBytes=..., mimeType=...)
  3. Step 3: Call client.models.generate_videos(...) with inputs and Veo settings, then poll until done and download the video

Best Practices

  • Start with defaults: veo-3.1-fast-generate-preview, 720p, duration 4 seconds
  • For 1080p/4k, reference images, interpolation, or video extension, set duration_seconds to 8
  • When using image inputs, always pass types.Image(imageBytes=..., mimeType=...) to avoid INVALID_ARGUMENT
  • Ensure GEMINI_API_KEY is configured and the target model matches your task
  • Video generation can take minutes; allow longer timeouts and handle long-running tasks

Example Use Cases

  • Create a 720p 4s product teaser from a text prompt
  • Convert a batch of product images into a short promotional video
  • Animate a scene using reference images with first/last frame interpolation
  • Extend an existing Veo video with video extension to add more frames
  • Produce a 1080p clip by increasing duration to 8s and enabling higher resolution

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers