Get the FREE Ultimate OpenClaw Setup Guide →

local-image-gen

Scanned
npx machina-cli add skill sacredvoid/skillkit/local-image-gen --openclaw
Files (1)
SKILL.md
11.2 KB

Local Image Generator

Generate images locally using Stable Diffusion. Auto-detects your hardware and picks the optimal model, device, and resolution.

Phase 0: Detect Compute Environment

Run this at the start of every invocation. It determines everything downstream.

python3 -c "
import platform, shutil, subprocess, json

info = {'os': platform.system(), 'arch': platform.machine(), 'ram_gb': 0, 'gpu': 'none', 'vram_gb': 0, 'device': 'cpu', 'dtype': 'float32'}

# RAM
try:
    if platform.system() == 'Darwin':
        import os; info['ram_gb'] = round(os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES') / (1024**3))
    elif platform.system() == 'Linux':
        with open('/proc/meminfo') as f:
            for line in f:
                if line.startswith('MemTotal'):
                    info['ram_gb'] = round(int(line.split()[1]) / (1024**2))
                    break
    else:
        import ctypes
        mem = ctypes.c_ulonglong(0)
        ctypes.windll.kernel32.GetPhysicallyInstalledMemory(ctypes.byref(mem))
        info['ram_gb'] = round(mem.value / (1024**2))
except: pass

# GPU detection
try:
    import torch
    if torch.cuda.is_available():
        info['gpu'] = torch.cuda.get_device_name(0)
        info['vram_gb'] = round(torch.cuda.get_device_properties(0).total_mem / (1024**3))
        info['device'] = 'cuda'
        info['dtype'] = 'float16'
    elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
        info['gpu'] = 'Apple Silicon (MPS)'
        info['vram_gb'] = info['ram_gb']  # unified memory
        info['device'] = 'mps'
        info['dtype'] = 'float16'
    elif hasattr(torch, 'hip') or 'AMD' in str(getattr(torch, '_C', '')):
        info['gpu'] = 'AMD (ROCm)'
        info['device'] = 'cuda'  # ROCm uses cuda API
        info['dtype'] = 'float16'
except ImportError:
    pass

print(json.dumps(info))
"

Parse the JSON output and store it internally as COMPUTE. Present the results to the user:

Detected hardware:

  • OS: {os} ({arch})
  • RAM: {ram_gb} GB
  • GPU: {gpu} ({vram_gb} GB VRAM)
  • Compute device: {device}

Model Selection Matrix

Based on the detected hardware, recommend a model from this table:

ConditionRecommended ModelReason
VRAM >= 8 GB (CUDA or MPS)stabilityai/sdxl-turboBest quality, fast with GPU
VRAM 4-7 GB (CUDA)stabilityai/sd-turboLighter model, fits in low VRAM
VRAM < 4 GB or CPU + RAM >= 16 GBstabilityai/sd-turbo + CPU offloadSlow but works
CPU + RAM < 16 GBsegmind/tiny-sdSmallest model, runs on anything

Present the recommendation and let the user choose:

AskUserQuestion: "Which image generation model should I use?"
Options:
- [Recommended model] (Recommended) — [reason based on their hardware]
- SDXL-Turbo — Best quality, needs 8+ GB VRAM, ~6 GB download
- SD-Turbo — Good quality, needs 4+ GB VRAM, ~3 GB download
- Tiny-SD — Lower quality, runs on any hardware, ~1 GB download

Store the chosen model as MODEL.

Resolution Selection

Based on model and VRAM:

VRAMSDXL-TurboSD-TurboTiny-SD
>= 16 GB1200x6401200x640768x408
8-15 GB1024x5761200x640768x408
4-7 GBN/A768x408512x272
CPUN/A512x272512x272

Steps Selection

DeviceSDXL-TurboSD-TurboTiny-SD
CUDA4-64-620-30
MPS6625
CPU6-86-830-40

Phase 1: Determine What to Generate

If the user provided a slug and prompt (argument after the skill name), parse them and skip to Phase 3.

Expected argument format: {slug} {prompt} (e.g., beginners-guide-to-rag abstract knowledge retrieval system with floating documents)

If only a slug was provided, read the blog post to generate an appropriate prompt:

# Find the blog post
cat content/blog/{SLUG}.mdx 2>/dev/null | head -50

Extract the title, description, and key themes. Generate a prompt using one of the SAI style templates below that best fits the post's topic. Each image MUST use a different style to avoid visual repetition across blog posts.

SAI Style Templates (pick ONE per image)

Each template wraps your subject description in a distinct visual style. Replace {subject} with a short, vivid description of the post's core concept as a visual metaphor.

StyleTemplateBest for
Isometricisometric style {subject}. vibrant, beautiful, crisp, detailed, ultra detailed, intricateArchitecture, systems, infrastructure
Low-polylow-poly style {subject}. low-poly game art, polygon mesh, jagged, blocky, wireframe edges, centered compositionTutorials, beginner guides, fundamentals
Neonpunkneonpunk style {subject}. cyberpunk, vaporwave, neon, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematicAI/ML, cutting-edge tech, future-facing
Concept artconcept art {subject}. digital artwork, illustrative, painterly, matte painting, highly detailedOpinion pieces, deep dives, strategy
Line artline art drawing {subject}. professional, sleek, modern, minimalist, graphic, line art, vector graphicsComparisons, frameworks, decision guides
3D modelprofessional 3d model {subject}. octane render, highly detailed, volumetric, dramatic lightingProduct/tool reviews, practical guides
Fantasyethereal fantasy concept art of {subject}. magnificent, celestial, ethereal, painterly, epic, majestic, magicalVision pieces, thought leadership
Cinematiccinematic film still {subject}. shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeousCase studies, real-world stories

Subject Description Guidelines

Write the {subject} as a vivid visual metaphor, not a literal description. Never include hands, fingers, faces, or human figures.

  • Good: "a crystalline data pipeline splitting light into rainbow streams"
  • Bad: "data pipeline architecture diagram"
  • Good: "mechanical clockwork gears meshing with glowing circuit traces"
  • Bad: "AI system with nodes and connections"

Vary across posts: color palette, physical metaphor (clockwork, rivers, crystals, bridges, constellations), and composition.

If no input was provided, use AskUserQuestion to ask for a slug and description.

Phase 2: Confirm with User

Present the generation plan:

I'll generate a hero image for {slug}: Model: {MODEL} on {device} Resolution: {width}x{height} ({steps} steps) Prompt: "{prompt}" Seed: {seed or "random"} Estimated time: {estimate based on device and model}

Want me to adjust anything before generating?

Time estimates:

DeviceSDXL-TurboSD-TurboTiny-SD
CUDA (RTX 3060+)5-10s3-8s15-25s
MPS (M1/M2/M3/M4)25-35s15-25s30-45s
CPU (16GB+ RAM)3-8 min2-5 min5-10 min

Phase 3: Install Dependencies

Check and install what's needed based on platform:

# Check Python + torch
python3 -c "import torch; print(torch.__version__)" 2>&1

If torch is missing, install based on platform:

PlatformInstall command
macOS (MPS)pip3 install torch torchvision
Linux (CUDA)pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Linux (ROCm)pip3 install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.0
Linux/Windows (CPU)pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu
Windows (CUDA)pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Then install diffusers:

pip3 install diffusers accelerate Pillow

Phase 4: Generate Image

Run inline Python generation (no external script needed):

import torch
from diffusers import AutoPipelineForText2Image
from PIL import Image
import os

MODEL = "{MODEL}"
DEVICE = "{device}"
DTYPE = torch.float16 if "{dtype}" == "float16" else torch.float32
WIDTH = {width}
HEIGHT = {height}
STEPS = {steps}
PROMPT = "{PROMPT}"
SEED = {SEED}
SLUG = "{SLUG}"

pipe = AutoPipelineForText2Image.from_pretrained(MODEL, torch_dtype=DTYPE, variant="fp16" if DTYPE == torch.float16 else None)
pipe = pipe.to(DEVICE)

if DEVICE == "cuda":
    pipe.enable_attention_slicing()  # Reduce VRAM usage

generator = torch.Generator(device="cpu").manual_seed(SEED)

image = pipe(prompt=PROMPT, num_inference_steps=STEPS, guidance_scale=0.0, width=WIDTH, height=HEIGHT, generator=generator).images[0]

out_dir = f"public/blog/{SLUG}"
os.makedirs(out_dir, exist_ok=True)
image.save(f"{out_dir}/hero.jpg", "JPEG", quality=90)
print(f"Saved to {out_dir}/hero.jpg ({WIDTH}x{HEIGHT})")

Run in the background for GPU, or warn the user about wait time for CPU.

For batch generation, loop over entries with the model loaded once.

Phase 5: Verify and Present

After generation:

  1. Check the output file exists:
file public/blog/{SLUG}/hero.jpg
  1. Show the image to the user using the Read tool.

  2. Ask:

AskUserQuestion: "How does this look?"
Options:
- Looks good, use it
- Regenerate with a different seed
- Adjust the prompt and try again
- Try a different model
- Discard

If "try a different model": go back to Phase 0's model selection and re-run.

SDXL-Turbo / SD-Turbo Prompting Rules

Critical: These turbo models use guidance_scale=0.0, which means negative prompts are IGNORED. All steering must come from the positive prompt alone.

DO:

  • Always lead with a SAI style prefix (see table above)
  • Use vivid, concrete visual metaphors as the subject
  • Specify a dominant color palette in the subject

DON'T:

  • Don't include hands, fingers, or human body parts (renders badly)
  • Don't include faces or people as focal subjects
  • Don't request text or labels (diffusion models can't render text reliably)
  • Don't rely on negative prompts (ignored at guidance_scale=0.0)

Error Handling

ErrorCauseFix
torch not foundPython deps missingInstall per platform table above
CUDA out of memoryModel too large for VRAMSwitch to smaller model or reduce resolution
MPS out of memoryNot enough unified memoryClose other apps, reduce resolution
RuntimeError: slow_conv2d_cpuRunning on CPU without float32Set DTYPE = torch.float32 for CPU
Black/noisy imageBad seed or too few stepsTry a different seed or increase steps
Model download failsNetwork issueCheck connection, model is cached at ~/.cache/huggingface/ after first download

Source

git clone https://github.com/sacredvoid/skillkit/blob/main/skills/local-image-gen/SKILL.mdView on GitHub

Overview

Generate custom images locally with Stable Diffusion. It auto-detects your hardware (Apple Silicon, NVIDIA, AMD, CPU) and selects the best model and settings for single or batch runs. It works cross-platform on macOS, Linux, and Windows.

How This Skill Works

On invocation, Phase 0 runs a Python snippet to detect OS, architecture, RAM, and available GPU. It then selects a recommended model based on VRAM and device, and prompts you to confirm before generation. You can generate images in either single or batch mode depending on your input.

When to Use It

  • You want private image generation without sending prompts to the cloud.
  • You have a GPU with 8 GB+ VRAM and want high-quality results quickly.
  • You need to generate multiple images in a batch for a project or test prompts.
  • You require cross-platform support (macOS, Linux, Windows) with automatic hardware detection.
  • You have limited VRAM or CPU-only hardware and want lighter models with CPU offload.

Quick Start

  1. Step 1: Run local-image-gen with a slug and image description, e.g., local-image-gen my-promo 'abstract neural network, dark blue gradient'.
  2. Step 2: Review the auto-detected hardware and the recommended MODEL; confirm or override.
  3. Step 3: Generate the image(s) in single or batch mode and save outputs to your target folder.

Best Practices

  • Run a quick hardware check to confirm Phase 0 reads your setup correctly.
  • Match the model to VRAM: SDXL-Turbo for 8+ GB, SD-Turbo for 4-7 GB, Tiny-SD for CPU-only or <4 GB.
  • Use batch mode to efficiently generate variations from a single prompt.
  • Save prompts, seeds, and results to reproduce or compare images later.
  • Monitor memory usage and enable CPU offload when needed to prevent swapping or slowdowns.

Example Use Cases

  • Batch generate 5 sci‑fi landscapes on a CUDA GPU with 12 GB VRAM.
  • Single high‑res portrait on Apple Silicon with 16 GB RAM using SDXL‑Turbo.
  • CPU‑only workstation with 16+ GB RAM generating 4 images using SD‑Turbo with CPU offload.
  • Linux desktop with NVIDIA GPU running a batch of 10 style variations.
  • Cross‑device workflow producing consistent results from the same prompt across different hardware.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers