local-image-gen
Scannednpx machina-cli add skill sacredvoid/skillkit/local-image-gen --openclawLocal Image Generator
Generate images locally using Stable Diffusion. Auto-detects your hardware and picks the optimal model, device, and resolution.
Phase 0: Detect Compute Environment
Run this at the start of every invocation. It determines everything downstream.
python3 -c "
import platform, shutil, subprocess, json
info = {'os': platform.system(), 'arch': platform.machine(), 'ram_gb': 0, 'gpu': 'none', 'vram_gb': 0, 'device': 'cpu', 'dtype': 'float32'}
# RAM
try:
if platform.system() == 'Darwin':
import os; info['ram_gb'] = round(os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES') / (1024**3))
elif platform.system() == 'Linux':
with open('/proc/meminfo') as f:
for line in f:
if line.startswith('MemTotal'):
info['ram_gb'] = round(int(line.split()[1]) / (1024**2))
break
else:
import ctypes
mem = ctypes.c_ulonglong(0)
ctypes.windll.kernel32.GetPhysicallyInstalledMemory(ctypes.byref(mem))
info['ram_gb'] = round(mem.value / (1024**2))
except: pass
# GPU detection
try:
import torch
if torch.cuda.is_available():
info['gpu'] = torch.cuda.get_device_name(0)
info['vram_gb'] = round(torch.cuda.get_device_properties(0).total_mem / (1024**3))
info['device'] = 'cuda'
info['dtype'] = 'float16'
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
info['gpu'] = 'Apple Silicon (MPS)'
info['vram_gb'] = info['ram_gb'] # unified memory
info['device'] = 'mps'
info['dtype'] = 'float16'
elif hasattr(torch, 'hip') or 'AMD' in str(getattr(torch, '_C', '')):
info['gpu'] = 'AMD (ROCm)'
info['device'] = 'cuda' # ROCm uses cuda API
info['dtype'] = 'float16'
except ImportError:
pass
print(json.dumps(info))
"
Parse the JSON output and store it internally as COMPUTE. Present the results to the user:
Detected hardware:
- OS: {os} ({arch})
- RAM: {ram_gb} GB
- GPU: {gpu} ({vram_gb} GB VRAM)
- Compute device: {device}
Model Selection Matrix
Based on the detected hardware, recommend a model from this table:
| Condition | Recommended Model | Reason |
|---|---|---|
| VRAM >= 8 GB (CUDA or MPS) | stabilityai/sdxl-turbo | Best quality, fast with GPU |
| VRAM 4-7 GB (CUDA) | stabilityai/sd-turbo | Lighter model, fits in low VRAM |
| VRAM < 4 GB or CPU + RAM >= 16 GB | stabilityai/sd-turbo + CPU offload | Slow but works |
| CPU + RAM < 16 GB | segmind/tiny-sd | Smallest model, runs on anything |
Present the recommendation and let the user choose:
AskUserQuestion: "Which image generation model should I use?"
Options:
- [Recommended model] (Recommended) — [reason based on their hardware]
- SDXL-Turbo — Best quality, needs 8+ GB VRAM, ~6 GB download
- SD-Turbo — Good quality, needs 4+ GB VRAM, ~3 GB download
- Tiny-SD — Lower quality, runs on any hardware, ~1 GB download
Store the chosen model as MODEL.
Resolution Selection
Based on model and VRAM:
| VRAM | SDXL-Turbo | SD-Turbo | Tiny-SD |
|---|---|---|---|
| >= 16 GB | 1200x640 | 1200x640 | 768x408 |
| 8-15 GB | 1024x576 | 1200x640 | 768x408 |
| 4-7 GB | N/A | 768x408 | 512x272 |
| CPU | N/A | 512x272 | 512x272 |
Steps Selection
| Device | SDXL-Turbo | SD-Turbo | Tiny-SD |
|---|---|---|---|
| CUDA | 4-6 | 4-6 | 20-30 |
| MPS | 6 | 6 | 25 |
| CPU | 6-8 | 6-8 | 30-40 |
Phase 1: Determine What to Generate
If the user provided a slug and prompt (argument after the skill name), parse them and skip to Phase 3.
Expected argument format: {slug} {prompt} (e.g., beginners-guide-to-rag abstract knowledge retrieval system with floating documents)
If only a slug was provided, read the blog post to generate an appropriate prompt:
# Find the blog post
cat content/blog/{SLUG}.mdx 2>/dev/null | head -50
Extract the title, description, and key themes. Generate a prompt using one of the SAI style templates below that best fits the post's topic. Each image MUST use a different style to avoid visual repetition across blog posts.
SAI Style Templates (pick ONE per image)
Each template wraps your subject description in a distinct visual style. Replace {subject} with a short, vivid description of the post's core concept as a visual metaphor.
| Style | Template | Best for |
|---|---|---|
| Isometric | isometric style {subject}. vibrant, beautiful, crisp, detailed, ultra detailed, intricate | Architecture, systems, infrastructure |
| Low-poly | low-poly style {subject}. low-poly game art, polygon mesh, jagged, blocky, wireframe edges, centered composition | Tutorials, beginner guides, fundamentals |
| Neonpunk | neonpunk style {subject}. cyberpunk, vaporwave, neon, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic | AI/ML, cutting-edge tech, future-facing |
| Concept art | concept art {subject}. digital artwork, illustrative, painterly, matte painting, highly detailed | Opinion pieces, deep dives, strategy |
| Line art | line art drawing {subject}. professional, sleek, modern, minimalist, graphic, line art, vector graphics | Comparisons, frameworks, decision guides |
| 3D model | professional 3d model {subject}. octane render, highly detailed, volumetric, dramatic lighting | Product/tool reviews, practical guides |
| Fantasy | ethereal fantasy concept art of {subject}. magnificent, celestial, ethereal, painterly, epic, majestic, magical | Vision pieces, thought leadership |
| Cinematic | cinematic film still {subject}. shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous | Case studies, real-world stories |
Subject Description Guidelines
Write the {subject} as a vivid visual metaphor, not a literal description. Never include hands, fingers, faces, or human figures.
- Good: "a crystalline data pipeline splitting light into rainbow streams"
- Bad: "data pipeline architecture diagram"
- Good: "mechanical clockwork gears meshing with glowing circuit traces"
- Bad: "AI system with nodes and connections"
Vary across posts: color palette, physical metaphor (clockwork, rivers, crystals, bridges, constellations), and composition.
If no input was provided, use AskUserQuestion to ask for a slug and description.
Phase 2: Confirm with User
Present the generation plan:
I'll generate a hero image for
{slug}: Model: {MODEL} on {device} Resolution: {width}x{height} ({steps} steps) Prompt: "{prompt}" Seed: {seed or "random"} Estimated time: {estimate based on device and model}Want me to adjust anything before generating?
Time estimates:
| Device | SDXL-Turbo | SD-Turbo | Tiny-SD |
|---|---|---|---|
| CUDA (RTX 3060+) | 5-10s | 3-8s | 15-25s |
| MPS (M1/M2/M3/M4) | 25-35s | 15-25s | 30-45s |
| CPU (16GB+ RAM) | 3-8 min | 2-5 min | 5-10 min |
Phase 3: Install Dependencies
Check and install what's needed based on platform:
# Check Python + torch
python3 -c "import torch; print(torch.__version__)" 2>&1
If torch is missing, install based on platform:
| Platform | Install command |
|---|---|
| macOS (MPS) | pip3 install torch torchvision |
| Linux (CUDA) | pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121 |
| Linux (ROCm) | pip3 install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.0 |
| Linux/Windows (CPU) | pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu |
| Windows (CUDA) | pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121 |
Then install diffusers:
pip3 install diffusers accelerate Pillow
Phase 4: Generate Image
Run inline Python generation (no external script needed):
import torch
from diffusers import AutoPipelineForText2Image
from PIL import Image
import os
MODEL = "{MODEL}"
DEVICE = "{device}"
DTYPE = torch.float16 if "{dtype}" == "float16" else torch.float32
WIDTH = {width}
HEIGHT = {height}
STEPS = {steps}
PROMPT = "{PROMPT}"
SEED = {SEED}
SLUG = "{SLUG}"
pipe = AutoPipelineForText2Image.from_pretrained(MODEL, torch_dtype=DTYPE, variant="fp16" if DTYPE == torch.float16 else None)
pipe = pipe.to(DEVICE)
if DEVICE == "cuda":
pipe.enable_attention_slicing() # Reduce VRAM usage
generator = torch.Generator(device="cpu").manual_seed(SEED)
image = pipe(prompt=PROMPT, num_inference_steps=STEPS, guidance_scale=0.0, width=WIDTH, height=HEIGHT, generator=generator).images[0]
out_dir = f"public/blog/{SLUG}"
os.makedirs(out_dir, exist_ok=True)
image.save(f"{out_dir}/hero.jpg", "JPEG", quality=90)
print(f"Saved to {out_dir}/hero.jpg ({WIDTH}x{HEIGHT})")
Run in the background for GPU, or warn the user about wait time for CPU.
For batch generation, loop over entries with the model loaded once.
Phase 5: Verify and Present
After generation:
- Check the output file exists:
file public/blog/{SLUG}/hero.jpg
-
Show the image to the user using the Read tool.
-
Ask:
AskUserQuestion: "How does this look?"
Options:
- Looks good, use it
- Regenerate with a different seed
- Adjust the prompt and try again
- Try a different model
- Discard
If "try a different model": go back to Phase 0's model selection and re-run.
SDXL-Turbo / SD-Turbo Prompting Rules
Critical: These turbo models use guidance_scale=0.0, which means negative prompts are IGNORED. All steering must come from the positive prompt alone.
DO:
- Always lead with a SAI style prefix (see table above)
- Use vivid, concrete visual metaphors as the subject
- Specify a dominant color palette in the subject
DON'T:
- Don't include hands, fingers, or human body parts (renders badly)
- Don't include faces or people as focal subjects
- Don't request text or labels (diffusion models can't render text reliably)
- Don't rely on negative prompts (ignored at guidance_scale=0.0)
Error Handling
| Error | Cause | Fix |
|---|---|---|
torch not found | Python deps missing | Install per platform table above |
CUDA out of memory | Model too large for VRAM | Switch to smaller model or reduce resolution |
MPS out of memory | Not enough unified memory | Close other apps, reduce resolution |
RuntimeError: slow_conv2d_cpu | Running on CPU without float32 | Set DTYPE = torch.float32 for CPU |
| Black/noisy image | Bad seed or too few steps | Try a different seed or increase steps |
| Model download fails | Network issue | Check connection, model is cached at ~/.cache/huggingface/ after first download |
Source
git clone https://github.com/sacredvoid/skillkit/blob/main/skills/local-image-gen/SKILL.mdView on GitHub Overview
Generate custom images locally with Stable Diffusion. It auto-detects your hardware (Apple Silicon, NVIDIA, AMD, CPU) and selects the best model and settings for single or batch runs. It works cross-platform on macOS, Linux, and Windows.
How This Skill Works
On invocation, Phase 0 runs a Python snippet to detect OS, architecture, RAM, and available GPU. It then selects a recommended model based on VRAM and device, and prompts you to confirm before generation. You can generate images in either single or batch mode depending on your input.
When to Use It
- You want private image generation without sending prompts to the cloud.
- You have a GPU with 8 GB+ VRAM and want high-quality results quickly.
- You need to generate multiple images in a batch for a project or test prompts.
- You require cross-platform support (macOS, Linux, Windows) with automatic hardware detection.
- You have limited VRAM or CPU-only hardware and want lighter models with CPU offload.
Quick Start
- Step 1: Run local-image-gen with a slug and image description, e.g., local-image-gen my-promo 'abstract neural network, dark blue gradient'.
- Step 2: Review the auto-detected hardware and the recommended MODEL; confirm or override.
- Step 3: Generate the image(s) in single or batch mode and save outputs to your target folder.
Best Practices
- Run a quick hardware check to confirm Phase 0 reads your setup correctly.
- Match the model to VRAM: SDXL-Turbo for 8+ GB, SD-Turbo for 4-7 GB, Tiny-SD for CPU-only or <4 GB.
- Use batch mode to efficiently generate variations from a single prompt.
- Save prompts, seeds, and results to reproduce or compare images later.
- Monitor memory usage and enable CPU offload when needed to prevent swapping or slowdowns.
Example Use Cases
- Batch generate 5 sci‑fi landscapes on a CUDA GPU with 12 GB VRAM.
- Single high‑res portrait on Apple Silicon with 16 GB RAM using SDXL‑Turbo.
- CPU‑only workstation with 16+ GB RAM generating 4 images using SD‑Turbo with CPU offload.
- Linux desktop with NVIDIA GPU running a batch of 10 style variations.
- Cross‑device workflow producing consistent results from the same prompt across different hardware.