image-generation
npx machina-cli add skill hex/claude-image-generation/image-generation --openclawImage Generation with Gemini, OpenAI, and xAI
Generate and edit images using Google Gemini, OpenAI GPT Image 1.5, and xAI Grok Image APIs via shell scripts.
Available Providers
Google Gemini
- Model:
gemini-2.5-flash-image(default) - Strengths: Fast generation, multi-turn editing, aspect ratio control
- Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 4:5, 5:4, 21:9
- Env var:
GEMINI_API_KEY
OpenAI GPT Image 1.5
- Model:
gpt-image-1.5 - Strengths: Superior text rendering, transparent backgrounds, up to 16 input images for editing, quality tiers
- Sizes: 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait)
- Quality: low (fast/cheap), medium, high (best fidelity)
- Env var:
OPENAI_API_KEY
xAI Grok Image
- Model:
grok-imagine-image(default),grok-2-image(basic generation only) - Strengths: Prompt revision by chat model, flat per-image pricing, diverse style range, many aspect ratios
- Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 2:1, 1:2, 19.5:9, 9:19.5, 20:9, 9:20, auto
- Editing: Same endpoint as generation; source image passed as data URI
- Env var:
XAI_API_KEYorGROK_API_KEY
Usage
Text-to-Image Generation
Use the scripts at ${CLAUDE_PLUGIN_ROOT}/scripts/:
# Gemini
bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" \
--mode generate \
--prompt "a serene mountain landscape at sunset" \
--output ./generated.png
# OpenAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/openai.sh" \
--mode generate \
--prompt "a serene mountain landscape at sunset" \
--output ./generated.png
# xAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/xai.sh" \
--mode generate \
--prompt "a serene mountain landscape at sunset" \
--output ./generated.png
Image Editing
# Gemini
bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" \
--mode edit \
--prompt "change the sky to a starry night" \
--input-image ./original.png \
--output ./edited.png
# OpenAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/openai.sh" \
--mode edit \
--prompt "change the sky to a starry night" \
--input-image ./original.png \
--output ./edited.png
# xAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/xai.sh" \
--mode edit \
--prompt "change the sky to a starry night" \
--input-image ./original.png \
--output ./edited.png
Parallel Generation
To generate with multiple providers simultaneously using the streaming display pane:
- Create a task per provider with TaskCreate, using
activeFormfor spinner text:- "Generate image with Gemini" (activeForm: "Generating image with Gemini...")
- "Generate image with OpenAI" (activeForm: "Generating image with OpenAI...")
- "Generate image with xAI" (activeForm: "Generating image with xAI...")
- Mark all tasks in_progress with TaskUpdate
- Open a streaming display pane (single Bash call, capture the output path):
source "${CLAUDE_PLUGIN_ROOT}/scripts/display.sh" && display_pane_open # outputs: /tmp/display_pane.XXXXXX - Launch Task subagents (subagent_type: Bash) in the same message so they run concurrently.
Pass
DISPLAY_PANE_DIRso images appear in the shared pane as each provider finishes:DISPLAY_PANE_DIR=/tmp/display_pane.XXXXXX bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" \ --mode generate --prompt "<prompt>" --output hero-gemini.png - As each subagent returns, mark its task completed via TaskUpdate
- After all providers complete, close the streaming pane to show controls:
DISPLAY_PANE_DIR=/tmp/display_pane.XXXXXX bash -c \ 'source "${CLAUDE_PLUGIN_ROOT}/scripts/display.sh" && display_pane_close' - Present all output file paths to the user
Prompting Tips
General
- Be specific and descriptive: "a golden retriever puppy playing in autumn leaves, soft afternoon light" beats "dog in park"
- Specify style explicitly: "watercolor painting", "photorealistic", "flat vector illustration"
- Include composition details: "close-up", "aerial view", "centered", "rule of thirds"
Text in Images
- OpenAI GPT Image 1.5 is significantly better at rendering text
- Put text in quotes or ALL CAPS in the prompt:
a sign that reads "OPEN 24 HOURS" - Specify typography details: font style, size, color, placement
Editing
- Describe what to change, not the whole image
- Be specific about which elements to preserve vs modify
- For Gemini: supports iterative multi-turn refinement
- For OpenAI: can accept up to 16 reference images
- For xAI: prompts are revised by a chat model before generation
Error Handling
- Scripts exit with code 1 on failure and print error details to stderr
- If an API key is missing, the script exits immediately with a clear message
- HTTP errors include the status code and API error message
- If multiple providers are used in parallel and one fails, report the error and present the successful results
- Rate limit errors (HTTP 429) mean the provider's quota is exhausted - try again later or use the other provider
Script Options Reference
gemini.sh
| Flag | Values | Default |
|---|---|---|
--mode | generate, edit | (required) |
--prompt | text | (required) |
--output | file path | (required) |
--input-image | file path | (edit only) |
--aspect-ratio | 1:1, 16:9, etc. | 1:1 |
--model | gemini model name | gemini-2.5-flash-image |
openai.sh
| Flag | Values | Default |
|---|---|---|
--mode | generate, edit | (required) |
--prompt | text | (required) |
--output | file path | (required) |
--input-image | file path | (edit only) |
--size | 1024x1024, 1536x1024, 1024x1536 | 1024x1024 |
--quality | low, medium, high | high |
--background | transparent, opaque, auto | auto |
--model | OpenAI model name | gpt-image-1.5 |
xai.sh
| Flag | Values | Default |
|---|---|---|
--mode | generate, edit | (required) |
--prompt | text | (required) |
--output | file path | (required) |
--input-image | file path | (edit only) |
--aspect-ratio | 1:1, 16:9, 9:16, 4:3, 3:4, etc. | (none) |
--model | xAI model name | grok-imagine-image |
Source
git clone https://github.com/hex/claude-image-generation/blob/main/skills/image-generation/SKILL.mdView on GitHub Overview
This skill creates and edits images using Google Gemini, OpenAI GPT Image 1.5, and xAI Grok Image APIs via shell scripts. It supports text-to-image generation, image edits, and parallel multi-provider workflows with explicit prompts and aspect ratios for precise results.
How This Skill Works
Choose a provider and run the corresponding shell script with mode set to generate or edit. Pass a prompt and an output path, and for edits include an input image; the script calls the provider API and returns the generated image. Each provider has its own model, sizes, quality options, and environment variable keys (GEMINI_API_KEY, OPENAI_API_KEY, XAI_API_KEY or GROK_API_KEY).
When to Use It
- You want to generate a new image from a text prompt using Gemini, GPT Image, or Grok.
- You need to edit an existing image, such as changing the sky or background, using a prompt.
- You want to compare outputs from multiple providers in parallel for faster decision making.
- You require specific aspect ratios (e.g., 1:1, 16:9, 4:3) or resolutions for a project asset.
- You’re aiming for high fidelity or particular styles (e.g., transparent background with GPT Image 1.5 or Grok’s diverse style range).
Quick Start
- Step 1: Choose a provider and run its script, e.g., bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" --mode generate --prompt "<your prompt>" --output ./generated.png
- Step 2: If editing, add --input-image ./source.png and a descriptive --prompt for the changes
- Step 3: Retrieve the output image path from the script, e.g., ./generated.png, and review fidelity or iterate
Best Practices
- Write clear, descriptive prompts that specify style, lighting, and composition.
- Select an aspect ratio from the provider’s supported options to match your asset.
- For edits, provide a high-quality source image and a precise edit prompt.
- If speed matters, start with lower quality or run providers in parallel to compare outputs.
- Organize outputs with consistent naming and verify results before final use.
Example Use Cases
- Generate a serene mountain landscape at sunset using Gemini, then compare with Grok outputs.
- Edit a product photo to replace the background with white and enhance contrast.
- Create multiple hero banner variations for a landing page across Gemini, OpenAI, and xAI.
- Render an image with a transparent background using GPT Image 1.5 for logo usage.
- Produce a 16:9 landscape image suitable for video thumbnails and social sharing.