Get the FREE Ultimate OpenClaw Setup Guide →

Image Generator

Scanned
npx machina-cli add skill aiskillstore/marketplace/image-generator --openclaw
Files (1)
SKILL.md
5.4 KB

Image Generator

Generate professional teaching visuals using Gemini 3 with multi-turn reasoning partnership.

Quick Start

# 1. Start browser (via browser-use skill)
bash .claude/skills/browser-use/scripts/start-server.sh

# 2. Navigate to Gemini
# Use browser_navigate to https://gemini.google.com/

# 3. Generate image from creative brief
# Paste creative brief → Wait 30-35s → Verify 6 gates → Download

Core Principles

  1. Reasoning over prediction - Creative briefs (Story/Intent/Metaphor) activate reasoning; pixel specs don't
  2. Multi-turn partnership - Teach Gemini your standards through principle-based feedback
  3. 6-gate quality - Explicit pass/fail before download
  4. Autonomous batch - No permission-asking between visuals

Input: Creative Brief Format

Receive from visual-asset-workflow:

## The Story
[Narrative about what's visualized]

## Emotional Intent
[What it should FEEL like]

## Visual Metaphor
[Universal concept for instant comprehension]

## Subject / Composition / Action / Location / Style
[Gemini 3 prompt structure]

## Color Semantics
Blue (#2563eb) = Authority | Green (#10b981) = Execution

## Typography Hierarchy
Largest: Key insight | Medium: Supporting | Smallest: Context

Do NOT convert to pixel specs - use as-is to activate reasoning.

Workflow (Per Visual)

StepActionTool
1Navigate to gemini.google.combrowser_navigate
2Select "šŸŒ Create Image"browser_click
3Paste creative briefbrowser_type
4Wait 30-35 secondsbrowser_wait_for
5Verify 6 gates (below)Visual inspection
6If fail: Iterate with feedback (max 3)browser_type
7If pass: Download full sizebrowser_click
8Copy to apps/learn-app/static/img/part-{N}/chapter-{NN}/Bash
9Embed in lesson immediatelyEdit
10NEW CHAT for next visualbrowser_navigate

Quality Gates (ALL Must Pass)

GateCriterionFail Action
1. Spelling99% accuracy (Y-Combinator, Kubernetes)Iterate
2. LayoutProportions match prompt (2Ɨ2 not 3Ɨ1)Iterate
3. ColorBrand colors match (#2563eb not #002050)Iterate
4. TypographyLargest = key concept (not decoration)Iterate
5. Teaching<5 sec concept grasp at target proficiencyIterate
6. UniquenessNot duplicate of existing chapter imageNew chat

Decision: ALL pass → Download | ANY fail → Iterate (max 3 tries)

Iteration: Principle-Based Feedback

When gate fails, provide teaching feedback:

Gate 4 FAILED: Typography hierarchy incorrect

The largest text is "$100K" (supporting detail) but should be "$3T"
(key insight students must grasp).

Increase '$3T' to dominant size. Reduce '$100K' to supporting size.
Information importance drives sizing.

Batch Mode

When invoked with "generate all visuals":

For EACH visual in list:
  A. NEW CHAT (context isolation)
  B. Generate (paste brief)
  C. Verify 6 gates
  D. Iterate if needed (max 3)
  E. Download when pass
  F. Embed in lesson
  G. Log "āœ… N/M"
  H. NEXT (no stopping)

Never ask: "Continue?" "Pause here?" "Review?"

Report at END only:

BATCH COMPLETE
āœ… Generated: 16/18
āš ļø Deferred: 2 (quality issues)
Location: apps/learn-app/static/img/part-{N}/

Proficiency Limits

LevelMax ElementsGrasp Time
A25-7<5 sec
B17-10<10 sec
C2No limitN/A

Token Conservation (Batch Mode)

For >8 visuals, condense briefs:

Original (250 tokens):

"Top Layer shows Coordinator at center top with label 'Orchestrator'
featuring conductor icon, with role 'Strategic oversight'..."

Condensed (80 tokens):

"Top Layer - Coordinator: Center top, 'Orchestrator' (conductor),
Role: 'Strategic oversight', Gold (#fbbf24), Large hexagon."

Keep: Story, Intent, Metaphor, Colors, Reasoning Condense: Long examples → Short labels

Anti-Patterns

Don'tWhy
Accept first output without 6 gatesQuality standard violation
Ask permission between batch itemsBreaks autonomous agency
Convert briefs to pixel specsDefeats reasoning activation
Skip embedding stepCreates orphan images
Reuse same chat for next visualContext contamination

Session Interruption

If session ends mid-batch, create checkpoint:

# Checkpoint: Part {N}
Status: INTERRUPTED at 8/18

## Completed:
- āœ… Image 1: filename (embedded lesson-01.md)
- āœ… Image 2: filename (embedded lesson-02.md)

## Remaining:
- ā³ Image 8: filename

On continuation: Read checkpoint → Resume → Update incrementally

Success Indicators

  • āœ… All 6 gates verified before download
  • āœ… Batch completion without permission-asking
  • āœ… Principle-based iteration feedback
  • āœ… Images organized by part/chapter
  • āœ… Immediate embedding (no orphans)
  • āœ… >85% production-ready rate

Source

git clone https://github.com/aiskillstore/marketplace/blob/main/skills/92bilal26/image-generator/SKILL.mdView on GitHub

Overview

Generates professional teaching visuals using Gemini 3 via a browser-automation workflow. It targets chapter illustrations, diagrams, and teaching visuals, not stock photos or decorative images, and enforces six quality gates to guarantee consistency and instructional clarity.

How This Skill Works

The skill runs a browser-use session to Gemini, guiding the user through a structured creative brief (Story, Emotional Intent, Visual Metaphor, Subject/Composition/Action/Location/Style, Color Semantics, Typography Hierarchy). It then generates images, validates them against six gates (with up to three iterations), downloads the final asset, and copies it into the lesson directory for immediate embedding, leveraging a multi-turn reasoning partnership with Gemini 3.

When to Use It

  • When creating chapter illustrations to accompany a lesson.
  • When designing diagrams or process visuals needing a clear visual metaphor.
  • When you require visuals that adhere to branded color semantics and typography hierarchy.
  • When generating visuals in batch for a multi-chapter course, with gate-driven QA.
  • When you need visuals for teaching, not stock photos or decorative imagery.

Quick Start

  1. Step 1: Start browser (via browser-use skill) using: bash .claude/skills/browser-use/scripts/start-server.sh
  2. Step 2: Navigate to Gemini: browser_navigate to https://gemini.google.com/
  3. Step 3: Generate image from creative brief: Paste creative brief → Wait 30-35s → Verify 6 gates → Download

Best Practices

  • Fill out the Creative Brief in full: The Story, Emotional Intent, Visual Metaphor, Subject/Composition/Action/Location/Style, Color Semantics, and Typography Hierarchy.
  • Rely on color semantics: Blue for Authority, Green for Execution.
  • Run all six quality gates and fix any gate failures via principle-based feedback (up to 3 iterations).
  • Do NOT convert to pixel specs — use the creative brief as-is to activate reasoning.
  • After download, copy to apps/learn-app/static/img/part-{N}/chapter-{NN}/ and embed immediately.

Example Use Cases

  • Illustration for a chapter on Gemini 3 capabilities and reasoning-based prompts.
  • Diagram showing data flow for a machine learning lesson.
  • Visual metaphor image illustrating the concept of Authority vs Execution with brand colors.
  • Step-by-step process image for a teaching module requiring clear sequencing.
  • Batch-generated visuals set for a 12-week course syllabus.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers ↗