What is genimg-gemini-web?

It is an image generation and text generation skill powered by Gemini Web, designed to act as the image generation backend for other skills and to support features like reference images and multi-turn sessions.

How do I generate an image from the CLI?

Use a prompt with --prompt and save the result with --image (or rely on the default generated.png). For example, npx -y bun scripts/main.ts --prompt "A dragon" --image dragon.png.

Can I generate video with this skill?

Yes, via the experimental generateVideo option. It may return an async placeholder and you might need to open the Gemini web UI to download the result; this option cannot be combined with image generation.

genimg-gemini-web

Scanned

npx machina-cli add skill proyecto26/sherlock-ai-plugin/genimg-gemini-web --openclaw

Files (1)

SKILL.md

5.5 KB

Gemini Web Client

Supports:

Text generation
Image generation (download + save)
Reference image upload (attach images for vision tasks)
Multi-turn conversations within the same executor instance (keepSession)
Experimental video generation (generateVideo) — Gemini may return an async placeholder; download might require Gemini web UI

Quick start

npx -y bun scripts/main.ts "Hello, Gemini"
npx -y bun scripts/main.ts --prompt "Explain quantum computing"
npx -y bun scripts/main.ts --prompt "A cute cat" --image cat.png
npx -y bun scripts/main.ts --promptfiles system.md content.md --image out.png

# Multi-turn conversation (agent generates unique sessionId)
npx -y bun scripts/main.ts "Remember this: 42" --sessionId my-unique-id-123
npx -y bun scripts/main.ts "What number?" --sessionId my-unique-id-123

Executor options (programmatic)

This skill is typically consumed via createGeminiWebExecutor(geminiOptions) (see scripts/executor.ts).

Key options in GeminiWebOptions:

referenceImages?: string | string[] Upload local images as references (vision input).
keepSession?: boolean Reuse Gemini chatMetadata to continue the same conversation across calls (required if you want reference images to persist across multiple messages).
generateVideo?: string Generate a video and (best-effort) download to the given path. Gemini may return video_gen_chip (async); in that case you must open Gemini web UI to download the result.

Notes:

generateVideo cannot be combined with generateImage / editImage.
When keepSession=true and referenceImages is set, reference images are uploaded once per executor instance.

Commands

Text generation

# Simple prompt (positional)
npx -y bun scripts/main.ts "Your prompt here"

# Explicit prompt flag
npx -y bun scripts/main.ts --prompt "Your prompt here"
npx -y bun scripts/main.ts -p "Your prompt here"

# With model selection
npx -y bun scripts/main.ts -p "Hello" -m gemini-2.5-pro

# Pipe from stdin
echo "Summarize this" | npx -y bun scripts/main.ts

Image generation

# Generate image with default path (./generated.png)
npx -y bun scripts/main.ts --prompt "A sunset over mountains" --image

# Generate image with custom path
npx -y bun scripts/main.ts --prompt "A cute robot" --image robot.png

# Shorthand
npx -y bun scripts/main.ts "A dragon" --image=dragon.png

Output formats

# Plain text (default)
npx -y bun scripts/main.ts "Hello"

# JSON output
npx -y bun scripts/main.ts "Hello" --json

Options

Option	Description
`--prompt <text>`, `-p`	Prompt text
`--promptfiles <files...>`	Read prompt from files (concatenated in order)
`--model <id>`, `-m`	Model: gemini-3-pro (default), gemini-2.5-pro, gemini-2.5-flash
`--image [path]`	Generate image, save to path (default: generated.png)
`--sessionId <id>`	Session ID for multi-turn conversation (agent generates unique ID)
`--list-sessions`	List saved sessions (max 100, sorted by update time)
`--json`	Output as JSON
`--login`	Refresh cookies only, then exit
`--cookie-path <path>`	Custom cookie file path
`--profile-dir <path>`	Chrome profile directory
`--help`, `-h`	Show help

CLI note: scripts/main.ts supports text generation, image generation, and multi-turn conversations via --sessionId. Reference images and video generation are exposed via the executor API.

Models

gemini-3-pro - Default, latest model
gemini-2.5-pro - Previous generation pro
gemini-2.5-flash - Fast, lightweight

Authentication

First run opens Chrome to authenticate with Google. Cookies are cached for subsequent runs.

# Force cookie refresh
npx -y bun scripts/main.ts --login

Environment variables

Variable	Description
`GEMINI_WEB_DATA_DIR`	Data directory
`GEMINI_WEB_COOKIE_PATH`	Cookie file path
`GEMINI_WEB_CHROME_PROFILE_DIR`	Chrome profile directory
`GEMINI_WEB_CHROME_PATH`	Chrome executable path

Examples

Generate text response

npx -y bun scripts/main.ts "What is the capital of France?"

Generate image

npx -y bun scripts/main.ts "A photorealistic image of a golden retriever puppy" --image puppy.png

Get JSON output for parsing

npx -y bun scripts/main.ts "Hello" --json | jq '.text'

Generate image from prompt files

# Concatenate system.md + content.md as prompt
npx -y bun scripts/main.ts --promptfiles system.md content.md --image output.png

Multi-turn conversation

# Start a session with unique ID (agent generates this)
npx -y bun scripts/main.ts "You are a helpful math tutor." --sessionId task-abc123

# Continue the conversation (remembers context)
npx -y bun scripts/main.ts "What is 2+2?" --sessionId task-abc123
npx -y bun scripts/main.ts "Now multiply that by 10" --sessionId task-abc123

# List recent sessions (max 100, sorted by update time)
npx -y bun scripts/main.ts --list-sessions

Session files are stored in ~/Library/Application Support/genimg-skills/gemini-web/sessions/<id>.json and contain:

id: Session ID
metadata: Gemini chat metadata for continuation
messages: Array of {role, content, timestamp, error?}
createdAt, updatedAt: Timestamps

Source

git clone https://github.com/proyecto26/sherlock-ai-plugin/blob/main/skills/genimg-gemini-web/SKILL.mdView on GitHub

Overview

Gemini Web provides image and text generation via Google Gemini, and serves as the image generation backend for other skills like cover-image, xhs-images, and article-illustrator. It supports uploading reference images, multi-turn conversations within a single executor, and experimental video generation. This makes it ideal for building rich visual content workflows in AI agents.

How This Skill Works

Typically consumed via createGeminiWebExecutor(geminiOptions). Key options include referenceImages for vision inputs, keepSession to reuse chat state across calls, and generateVideo for experimental video output. When keepSession is true and referenceImages are set, uploads persist for the executor; generateVideo may return an async placeholder and requires opening the Gemini web UI to download the result. Note that generateVideo cannot be combined with generateImage.

When to Use It

You need to generate an image from a natural language prompt using Gemini Web.
You want to perform text generation in the same workflow as image generation.
You require a persistent conversation with Gemini Web across multiple calls (keepSession).
You need to attach reference images to guide vision tasks (referenceImages).
You want to experiment with video generation and download outputs (generateVideo).

Quick Start

Step 1: npx -y bun scripts/main.ts "Your prompt here"
Step 2: npx -y bun scripts/main.ts --prompt "Your prompt here" --image output.png
Step 3: npx -y bun scripts/main.ts "Follow-up prompt" --sessionId my-session

Best Practices

Start with the default gemini-3-pro model unless you need alternatives.
Enable keepSession with referenceImages to persist context and references across calls.
Avoid enabling generateVideo at the same time as generateImage.
Provide prompts with clear style or composition notes to improve results.
Save outputs explicitly using --image to control file paths and prevent overwrites.

Example Use Cases

Generate an image from the prompt A sunset over mountains and save it as generated.png.
Create a cute robot illustration and save to robot.png using --image.
Run a multi-turn session with a unique sessionId to refine a concept (Remember this: 42 followed by follow-up prompts).
Attach a reference image (reference.png) to guide a vision task in subsequent messages.
Experiment with video generation by supplying generateVideo and downloading the result via Gemini UI when supported.

Frequently Asked Questions

Add this skill to your agents