image-generation
npx machina-cli add skill Xiangyu-CAS/Vision-Skills/image-generation --openclawImage Generation with Gemini
Use this skill when the user asks to generate or edit images with Gemini using the Python SDK. Default to gemini-3-pro-image-preview, and mention gemini-2.5-flash-image only as an optional faster/cheaper alternative.
Workflow
- Identify task type (text-to-image, edit, or multi-reference).
- Ensure
GEMINI_API_KEYis available (env or stored in.env), then use the Python SDK. This will make network requests to the Gemini API - Choose model + output (
response_modalities=["IMAGE"]if image-only) and run. Generation can take ~30 seconds; allow 30–60 seconds before retrying. - Save returned images with
part.as_image(); if none, report a clear error.
Use these references
references/python.mdfor Python SDK usage
Response handling (Python SDK)
Use part.as_image() to access image outputs and save them. If no image parts are returned, surface a clear error and suggest checking the API key, model name, and response modalities.
Timing note
Image generation may take around 30 seconds. When running commands via the shell tool, set a longer timeout (e.g., 60–120 seconds) to avoid premature timeouts.
Source
git clone https://github.com/Xiangyu-CAS/Vision-Skills/blob/main/skills/image-generation/SKILL.mdView on GitHub Overview
This skill enables generating and editing images using Gemini through the Python SDK. It supports text-to-image, image edits, multi-reference composition, and Google Search grounding, with the default model gemini-3-pro-image-preview and an optional faster alternative gemini-2.5-flash-image.
How This Skill Works
Workflow: Identify task type (text-to-image, edit, or multi-reference) and ensure GEMINI_API_KEY is available in the environment or .env, then call the Python SDK. Choose the model and output (response_modalities=['IMAGE'] for image outputs) and run; generation typically takes ~30 seconds. Save results with part.as_image(); if no images are returned, surface a clear error.
When to Use It
- Generate a new image from a detailed text prompt using the default gemini-3-pro-image-preview model.
- Edit an existing image (adjust color, lighting, style, or composition) via the Python SDK.
- Create a multi-reference composition by blending elements from several reference images.
- Ground generated visuals with Google Search references to ensure relevance and accuracy.
- Integrate Gemini image generation into a Python workflow and manage timing and errors (including using the faster/cheaper alternative when appropriate).
Quick Start
- Step 1: Set GEMINI_API_KEY in your environment or a .env file and decide on the model (default gemini-3-pro-image-preview; gemini-2.5-flash-image as needed).
- Step 2: Install and import the Gemini Python SDK, then configure the client to point to the Gemini API.
- Step 3: Call the image generation/edit API with response_modalities=['IMAGE'] and save the results using part.as_image().
Best Practices
- Ensure GEMINI_API_KEY is available in your environment or a .env file before calling the API.
- When you need image outputs, set response_modalities to ['IMAGE'] and verify the model matches your use case.
- Default to gemini-3-pro-image-preview; use gemini-2.5-flash-image only when you need faster results or lower cost.
- Allow ~30–60 seconds for generation and consider increasing shell/tool timeouts to 60–120 seconds.
- After generation, fetch images with part.as_image() and handle cases where no image parts are returned by prompting for key/model/response modality checks.
Example Use Cases
- Generate a sci-fi cityscape from a detailed prompt and export as PNG.
- Edit a product photo to adjust background and lighting while preserving subject realism.
- Create a concept art montage by blending styles from multiple reference images.
- Ground a fantasy scene with Google Search-derived references to verify plausibility.
- Automate image generation in a Python script and save outputs programmatically with part.as_image().