What can I generate with this skill?

Text-to-image generation and edits across Gemini, GPT Image 1.5, and xAI Grok Image, with support for multiple aspect ratios and editing workflows.

How do I edit an image?

Use the script with mode edit, provide a prompt and the input-image path; the same endpoint handles generation and editing.

Can I run multiple providers at once?

Yes. You can launch multiple provider scripts in parallel to generate outputs simultaneously and compare results.

image-generation

npx machina-cli add skill hex/claude-image-generation/image-generation --openclaw

Files (1)

SKILL.md

6.5 KB

Image Generation with Gemini, OpenAI, and xAI

Generate and edit images using Google Gemini, OpenAI GPT Image 1.5, and xAI Grok Image APIs via shell scripts.

Available Providers

Google Gemini

Model: gemini-2.5-flash-image (default)
Strengths: Fast generation, multi-turn editing, aspect ratio control
Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 4:5, 5:4, 21:9
Env var: GEMINI_API_KEY

OpenAI GPT Image 1.5

Model: gpt-image-1.5
Strengths: Superior text rendering, transparent backgrounds, up to 16 input images for editing, quality tiers
Sizes: 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait)
Quality: low (fast/cheap), medium, high (best fidelity)
Env var: OPENAI_API_KEY

xAI Grok Image

Model: grok-imagine-image (default), grok-2-image (basic generation only)
Strengths: Prompt revision by chat model, flat per-image pricing, diverse style range, many aspect ratios
Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 2:1, 1:2, 19.5:9, 9:19.5, 20:9, 9:20, auto
Editing: Same endpoint as generation; source image passed as data URI
Env var: XAI_API_KEY or GROK_API_KEY

Usage

Text-to-Image Generation

Use the scripts at ${CLAUDE_PLUGIN_ROOT}/scripts/:

# Gemini
bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" \
  --mode generate \
  --prompt "a serene mountain landscape at sunset" \
  --output ./generated.png

# OpenAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/openai.sh" \
  --mode generate \
  --prompt "a serene mountain landscape at sunset" \
  --output ./generated.png

# xAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/xai.sh" \
  --mode generate \
  --prompt "a serene mountain landscape at sunset" \
  --output ./generated.png

Image Editing

# Gemini
bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" \
  --mode edit \
  --prompt "change the sky to a starry night" \
  --input-image ./original.png \
  --output ./edited.png

# OpenAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/openai.sh" \
  --mode edit \
  --prompt "change the sky to a starry night" \
  --input-image ./original.png \
  --output ./edited.png

# xAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/xai.sh" \
  --mode edit \
  --prompt "change the sky to a starry night" \
  --input-image ./original.png \
  --output ./edited.png

Parallel Generation

To generate with multiple providers simultaneously using the streaming display pane:

Create a task per provider with TaskCreate, using activeForm for spinner text:
- "Generate image with Gemini" (activeForm: "Generating image with Gemini...")
- "Generate image with OpenAI" (activeForm: "Generating image with OpenAI...")
- "Generate image with xAI" (activeForm: "Generating image with xAI...")
Mark all tasks in_progress with TaskUpdate

Open a streaming display pane (single Bash call, capture the output path):

source "${CLAUDE_PLUGIN_ROOT}/scripts/display.sh" && display_pane_open
# outputs: /tmp/display_pane.XXXXXX

Launch Task subagents (subagent_type: Bash) in the same message so they run concurrently. Pass DISPLAY_PANE_DIR so images appear in the shared pane as each provider finishes:
```
DISPLAY_PANE_DIR=/tmp/display_pane.XXXXXX bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" \
  --mode generate --prompt "<prompt>" --output hero-gemini.png
```
As each subagent returns, mark its task completed via TaskUpdate

After all providers complete, close the streaming pane to show controls:

DISPLAY_PANE_DIR=/tmp/display_pane.XXXXXX bash -c \
  'source "${CLAUDE_PLUGIN_ROOT}/scripts/display.sh" && display_pane_close'

Present all output file paths to the user

Prompting Tips

General

Be specific and descriptive: "a golden retriever puppy playing in autumn leaves, soft afternoon light" beats "dog in park"
Specify style explicitly: "watercolor painting", "photorealistic", "flat vector illustration"
Include composition details: "close-up", "aerial view", "centered", "rule of thirds"

Text in Images

OpenAI GPT Image 1.5 is significantly better at rendering text
Put text in quotes or ALL CAPS in the prompt: a sign that reads "OPEN 24 HOURS"
Specify typography details: font style, size, color, placement

Editing

Describe what to change, not the whole image
Be specific about which elements to preserve vs modify
For Gemini: supports iterative multi-turn refinement
For OpenAI: can accept up to 16 reference images
For xAI: prompts are revised by a chat model before generation

Error Handling

Scripts exit with code 1 on failure and print error details to stderr
If an API key is missing, the script exits immediately with a clear message
HTTP errors include the status code and API error message
If multiple providers are used in parallel and one fails, report the error and present the successful results
Rate limit errors (HTTP 429) mean the provider's quota is exhausted - try again later or use the other provider

Script Options Reference

gemini.sh

Flag	Values	Default
`--mode`	generate, edit	(required)
`--prompt`	text	(required)
`--output`	file path	(required)
`--input-image`	file path	(edit only)
`--aspect-ratio`	1:1, 16:9, etc.	1:1
`--model`	gemini model name	gemini-2.5-flash-image

openai.sh

Flag	Values	Default
`--mode`	generate, edit	(required)
`--prompt`	text	(required)
`--output`	file path	(required)
`--input-image`	file path	(edit only)
`--size`	1024x1024, 1536x1024, 1024x1536	1024x1024
`--quality`	low, medium, high	high
`--background`	transparent, opaque, auto	auto
`--model`	OpenAI model name	gpt-image-1.5

xai.sh

Flag	Values	Default
`--mode`	generate, edit	(required)
`--prompt`	text	(required)
`--output`	file path	(required)
`--input-image`	file path	(edit only)
`--aspect-ratio`	1:1, 16:9, 9:16, 4:3, 3:4, etc.	(none)
`--model`	xAI model name	grok-imagine-image

Source

git clone https://github.com/hex/claude-image-generation/blob/main/skills/image-generation/SKILL.mdView on GitHub

Overview

This skill creates and edits images using Google Gemini, OpenAI GPT Image 1.5, and xAI Grok Image APIs via shell scripts. It supports text-to-image generation, image edits, and parallel multi-provider workflows with explicit prompts and aspect ratios for precise results.

How This Skill Works

Choose a provider and run the corresponding shell script with mode set to generate or edit. Pass a prompt and an output path, and for edits include an input image; the script calls the provider API and returns the generated image. Each provider has its own model, sizes, quality options, and environment variable keys (GEMINI_API_KEY, OPENAI_API_KEY, XAI_API_KEY or GROK_API_KEY).

When to Use It

You want to generate a new image from a text prompt using Gemini, GPT Image, or Grok.
You need to edit an existing image, such as changing the sky or background, using a prompt.
You want to compare outputs from multiple providers in parallel for faster decision making.
You require specific aspect ratios (e.g., 1:1, 16:9, 4:3) or resolutions for a project asset.
You’re aiming for high fidelity or particular styles (e.g., transparent background with GPT Image 1.5 or Grok’s diverse style range).

Quick Start

Step 1: Choose a provider and run its script, e.g., bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" --mode generate --prompt "<your prompt>" --output ./generated.png
Step 2: If editing, add --input-image ./source.png and a descriptive --prompt for the changes
Step 3: Retrieve the output image path from the script, e.g., ./generated.png, and review fidelity or iterate

Best Practices

Write clear, descriptive prompts that specify style, lighting, and composition.
Select an aspect ratio from the provider’s supported options to match your asset.
For edits, provide a high-quality source image and a precise edit prompt.
If speed matters, start with lower quality or run providers in parallel to compare outputs.
Organize outputs with consistent naming and verify results before final use.

Example Use Cases

Generate a serene mountain landscape at sunset using Gemini, then compare with Grok outputs.
Edit a product photo to replace the background with white and enhance contrast.
Create multiple hero banner variations for a landing page across Gemini, OpenAI, and xAI.
Render an image with a transparent background using GPT Image 1.5 for logo usage.
Produce a 16:9 landscape image suitable for video thumbnails and social sharing.

Frequently Asked Questions

Add this skill to your agents