What is Gemini Image Generation?

A suite of Python-scripted tools that use the Gemini API to generate and edit images, with eight specialized scripts for text-to-image, editing, style transfer, and multi-image tasks.

What modes are available?

Text-to-Image (Standard, High-res, Search-grounded), Image Editing (General edit, Style transfer), Multi-Image (Compose, Multi-reference), and Interactive Multi-turn editing.

How do I run the scripts?

Run python $CLAUDE_PLUGIN_ROOT/scripts/ .py --prompt '...' [options]. Use proper resolution like 1K/2K/4K and valid aspect ratios; provide input images for editing when required.

Gemini Image Generation

npx machina-cli add skill Ibrahim-3d/nano-banana-claude-plugin/genimage --openclaw

Files (1)

SKILL.md

4.0 KB

Gemini Image Generation Skill

Provides comprehensive knowledge for generating and editing images using the Gemini API through 8 specialized Python scripts.

How It Works

All generation and editing is text-guided through the Gemini API. There is no visual UI, mask painter, or interactive editor. You describe what you want in natural language, and Gemini's AI handles the rest.

For image editing, Gemini semantically understands your text instructions and automatically identifies which regions of the image to modify. For example, "replace the sky with a sunset" - Gemini knows what "the sky" means and replaces only that region.

Available Modes

Text-to-Image (no input image)

Standard (texttoimage.py): Fast generation via gemini-2.5-flash-image at 1K
High-res (hires.py): 2K or 4K via gemini-3-pro-image-preview
Search-grounded (searchground.py): Uses real-time Google Search data

Image Editing (input image + text)

General edit (imageedit.py): All text-guided editing - inpainting, add/remove objects, background replacement, detail-preserving edits, bringing sketches to life, changing angles, and any other modification
Style transfer (styletransfer.py): Apply the artistic style of one image onto another (requires 2 images)

Multi-Image

Compose (compose.py): Combine elements from multiple images
Multi-reference (multiref.py): Up to 14 reference images (gemini-3-pro)

Interactive

Multi-turn (multiturn.py): Chat-based iterative editing with memory

Running Scripts

python "$CLAUDE_PLUGIN_ROOT/scripts/<script>.py" --prompt "..." [options]

Editing Examples

All of these use imageedit.py --image photo.png --prompt "...":

Inpainting: "Replace the sky with dramatic storm clouds"
Remove object: "Remove the person on the left and fill naturally"
Add object: "Add a golden retriever sitting on the couch"
Background swap: "Replace the background with a tropical beach"
Bring to life: "Transform this pencil sketch into a photorealistic image"
Detail preserve: "Place this logo on a billboard in Times Square, keep the logo sharp"
Style change: "Make this photo look like an oil painting"

Prompting Guide

Photorealistic Scenes

Use photography terms: camera angles, lens types, lighting, fine details.

Template: "A photorealistic [shot type] of [subject], [action], set in [environment]. Illuminated by [lighting], creating a [mood] atmosphere. Captured with [camera/lens]. [aspect ratio] format."

Stylized Illustrations

Specify art style, color palette, medium, background.

Template: "A [style] illustration of [subject] in the style of [medium]. Color palette: [colors]. Background: [description]. No text."

Text-Heavy Images

Put desired text in quotes. Specify font style and placement.

Template: "An image containing the text "[text]" in [font]. [Describe scene around it]."

Product Photography

Describe materials, lighting setup, environment, camera angle.

Template: "Professional product photography of [product] on [surface]. [Lighting] lighting, [background]. Shot from [angle] with [lens]."

Aspect Ratios

Supported: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

Resolution (gemini-3-pro-image-preview only)

1K - Default (1024px)
2K - High resolution (2048px)
4K - Ultra resolution (4096px)

Must use uppercase K.

Models

gemini-2.5-flash-image (Nano Banana): Fast, 1K, high-volume
gemini-3-pro-image-preview (Nano Banana Pro): Pro quality, up to 4K, thinking mode, search grounding, 14 reference images

Source

git clone https://github.com/Ibrahim-3d/nano-banana-claude-plugin/blob/main/skills/genimage/SKILL.mdView on GitHub

Overview

Gemini Image Generation provides text-guided image creation and editing via the Gemini API using eight specialized Python scripts. It supports rapid text-to-image generation, high-res outputs up to 4K, style transfers, background edits, and multi-image compositions—all without a visual UI. This enables fast, repeatable visual content workflows that you can automate.

How This Skill Works

Tasks are driven by natural language prompts passed to Gemini through scripts like texttoimage.py, hires.py, searchground.py, imageedit.py, styletransfer.py, compose.py, multiref.py, and multiturn.py. Gemini semantically interprets prompts to identify regions for edits, enabling actions like replacing the sky or background swaps. Run a script with python $CLAUDE_PLUGIN_ROOT/scripts/<script>.py --prompt "..." [options] and select the desired resolution (1K/2K/4K) and aspect ratio.

When to Use It

Generate a brand-new image from a descriptive prompt (Text-to-Image).
Edit an existing image via inpainting, object removal, background changes, or detail edits (Image Editing).
Apply a style transfer from one image to another.
Assemble elements from multiple images (Multi-Image: Compose or Multi-reference).
Engage in interactive, memory-based editing across turns (Interactive: Multi-turn).

Quick Start

Step 1: Pick a script and craft a prompt, e.g., python $CLAUDE_PLUGIN_ROOT/scripts/texttoimage.py --prompt 'A photorealistic sunset over mountains.' --resolution 4K --aspect 16:9
Step 2: If editing, run imageedit.py with an input image and a prompt, e.g., python $CLAUDE_PLUGIN_ROOT/scripts/imageedit.py --image scene.jpg --prompt 'Replace the sky with a sunset' --resolution 2K
Step 3: Review results, adjust prompt or options, and upscale to 4K using gemini-3-pro-image-preview.

Best Practices

Start with a concise prompt and clearly specify the target aspect ratio and resolution (e.g., 4K in a 16:9 frame).
Use the provided templates to guide prompts for photorealistic, stylized illustrations, or product photography.
In editing, reference explicit regions (e.g., 'replace the sky' or 'remove the person on the left') to leverage semantic understanding.
For multi-image tasks, supply well-labeled reference images and choose the appropriate mode (Compose or Multi-reference).
Iterate with higher-detail prompts and upscale results to 4K if ultra-high resolution is required.

Example Use Cases

Replace the sky with dramatic storm clouds (Inpainting).
Remove the person on the left and fill naturally (Object removal).
Add a golden retriever sitting on the couch (Add object).
Replace the background with a tropical beach (Background swap).
Make this photo look like an oil painting (Style transfer).

Frequently Asked Questions

Add this skill to your agents