Which Gemini model should I use?

Default to gemini-3-pro-image-preview for high-quality output; use gemini-2.5-flash-image when you need faster results or lower cost.

How long does generation take and how should I handle timeouts?

Generation typically takes about 30 seconds. If running in a shell, set a timeout of 60–120 seconds and retry or switch to a faster/cheaper model if needed.

image-generation

npx machina-cli add skill Xiangyu-CAS/Vision-Skills/image-generation --openclaw

Files (1)

SKILL.md

1.4 KB

Image Generation with Gemini

Use this skill when the user asks to generate or edit images with Gemini using the Python SDK. Default to gemini-3-pro-image-preview, and mention gemini-2.5-flash-image only as an optional faster/cheaper alternative.

Workflow

Identify task type (text-to-image, edit, or multi-reference).
Ensure GEMINI_API_KEY is available (env or stored in .env), then use the Python SDK. This will make network requests to the Gemini API
Choose model + output (response_modalities=["IMAGE"] if image-only) and run. Generation can take ~30 seconds; allow 30–60 seconds before retrying.
Save returned images with part.as_image(); if none, report a clear error.

Use these references

references/python.md for Python SDK usage

Response handling (Python SDK)

Use part.as_image() to access image outputs and save them. If no image parts are returned, surface a clear error and suggest checking the API key, model name, and response modalities.

Timing note

Image generation may take around 30 seconds. When running commands via the shell tool, set a longer timeout (e.g., 60–120 seconds) to avoid premature timeouts.

Source

git clone https://github.com/Xiangyu-CAS/Vision-Skills/blob/main/skills/image-generation/SKILL.mdView on GitHub

Overview

This skill enables generating and editing images using Gemini through the Python SDK. It supports text-to-image, image edits, multi-reference composition, and Google Search grounding, with the default model gemini-3-pro-image-preview and an optional faster alternative gemini-2.5-flash-image.

How This Skill Works

Workflow: Identify task type (text-to-image, edit, or multi-reference) and ensure GEMINI_API_KEY is available in the environment or .env, then call the Python SDK. Choose the model and output (response_modalities=['IMAGE'] for image outputs) and run; generation typically takes ~30 seconds. Save results with part.as_image(); if no images are returned, surface a clear error.

When to Use It

Generate a new image from a detailed text prompt using the default gemini-3-pro-image-preview model.
Edit an existing image (adjust color, lighting, style, or composition) via the Python SDK.
Create a multi-reference composition by blending elements from several reference images.
Ground generated visuals with Google Search references to ensure relevance and accuracy.
Integrate Gemini image generation into a Python workflow and manage timing and errors (including using the faster/cheaper alternative when appropriate).

Quick Start

Step 1: Set GEMINI_API_KEY in your environment or a .env file and decide on the model (default gemini-3-pro-image-preview; gemini-2.5-flash-image as needed).
Step 2: Install and import the Gemini Python SDK, then configure the client to point to the Gemini API.
Step 3: Call the image generation/edit API with response_modalities=['IMAGE'] and save the results using part.as_image().

Best Practices

Ensure GEMINI_API_KEY is available in your environment or a .env file before calling the API.
When you need image outputs, set response_modalities to ['IMAGE'] and verify the model matches your use case.
Default to gemini-3-pro-image-preview; use gemini-2.5-flash-image only when you need faster results or lower cost.
Allow ~30–60 seconds for generation and consider increasing shell/tool timeouts to 60–120 seconds.
After generation, fetch images with part.as_image() and handle cases where no image parts are returned by prompting for key/model/response modality checks.

Example Use Cases

Generate a sci-fi cityscape from a detailed prompt and export as PNG.
Edit a product photo to adjust background and lighting while preserving subject realism.
Create a concept art montage by blending styles from multiple reference images.
Ground a fantasy scene with Google Search-derived references to verify plausibility.
Automate image generation in a Python script and save outputs programmatically with part.as_image().

Frequently Asked Questions

Add this skill to your agents