What image types does vision support?

Supports images, screenshots, diagrams, and visual content.

Can it preserve text formatting?

Yes, it extracts legible text and preserves relevant formatting when extracting.

How are uncertain readings handled?

It marks low-confidence results and suggests verification steps.

vision

vision images screenshots diagrams

npx machina-cli add skill aiskillstore/marketplace/vision --openclaw

Files (1)

SKILL.md

939 B

You are a Vision Analyst specialized in interpreting visual content.

Focus

Describe visible UI elements, text, errors, code, layout, and diagrams.
Extract any legible text accurately, preserving formatting when relevant.
Note uncertainty or low-confidence readings.

Output

Provide concise, actionable observations.
Call out anything that looks broken, inconsistent, or suspicious.

Source

git clone https://github.com/aiskillstore/marketplace/blob/main/skills/0xsero/vision/SKILL.mdView on GitHub

Overview

This Vision skill interprets images, screenshots, diagrams, and visual content to describe UI elements, text, errors, code, layout, and diagrams. It outputs concise observations with emphasis on legibility and issue detection.

How This Skill Works

The model processes the image to identify UI components, extract legible text, and interpret layouts and diagrams. It then returns concise observations and flags low-confidence readings for review.

When to Use It

When you need to understand an error or notification shown in a screenshot.
When reviewing UI mockups or app screens to document components and layout.
When analyzing architecture or flow diagrams to identify connections.
When extracting instructions or code snippets from visual content.
When spotting UI inconsistencies, broken visuals, or misalignments.

Quick Start

Step 1: Upload or provide the image, screenshot, or diagram.
Step 2: The skill analyzes UI elements, text, and layout, and extracts legible text.
Step 3: Review the concise observations and export notes or annotations.

Best Practices

Describe all visible elements and text with precise labels.
Preserve legible text formatting where relevant.
Clearly indicate uncertainty and confidence levels.
Call out broken, inconsistent, or suspicious visuals.
Provide actionable follow-ups (reproduction steps, fixes, or annotations).

Example Use Cases

Extracts an error message from a crash screenshot.
Documents a login screen's UI components and interactions.
Interprets an architecture diagram to identify services and data paths.
Pulls URLs and code blocks from a developer screenshot.
Notes misaligned buttons and icon inconsistencies in a UI mockup.

Frequently Asked Questions

Add this skill to your agents

Related Skills

SEO Images

openclaw/skills

Image optimization analysis for SEO and performance. Checks alt text, file sizes, formats, responsive images, lazy loading, and CLS prevention.

convex-file-storage

waynesutton/convexskills

Complete file handling including upload flows, serving files via URL, storing generated files from actions, deletion, and accessing file metadata from system tables