Get the FREE Ultimate OpenClaw Setup Guide →

agent-browser

Scanned
npx machina-cli add skill fmschulz/omics-skills/agent-browser --openclaw
Files (1)
SKILL.md
1.6 KB

Agent Browser

Automate browser interactions through the agent-browser CLI for repeatable, scriptable web tasks.

Instructions

  1. Install and initialize the CLI.
  2. Open the target URL and capture a snapshot.
  3. Interact with elements using snapshot references.
  4. Re-snapshot after navigation or state changes.
  5. Export results (screenshots or JSON) for downstream use.

Quick Reference

TaskAction
Installnpm install -g agent-browser then agent-browser install
Open pageagent-browser open <url>
Snapshotagent-browser snapshot -i --json
Interactclick @eN, fill @eN "text"
Screenshotagent-browser screenshot output.png
DocsSee references/quick-start.md

Input Requirements

  • Target URL(s)
  • CLI installed and Chromium downloaded
  • Credentials if login is required

Output

  • Screenshots (PNG)
  • JSON snapshots of page structure
  • Extracted text/attributes

Quality Gates

  • Snapshot captured after each major navigation step
  • Interactions verified in a follow-up snapshot
  • Outputs saved to disk with clear filenames

Examples

Example 1: Capture a page snapshot

agent-browser open https://example.org
agent-browser snapshot -i --json > page.json

Troubleshooting

Issue: Chromium not installed Solution: Run agent-browser install (add --with-deps on Linux).

Issue: Element not found Solution: Re-snapshot and confirm the correct element reference.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/agent-browser/SKILL.mdView on GitHub

Overview

agent-browser enables repeatable, scriptable browser tasks via a CLI. It supports opening target URLs, interacting with elements, taking snapshots, and exporting results as screenshots or JSON for downstream use, making UI testing and scraping more reliable.

How This Skill Works

Install the CLI and ensure Chromium is downloaded. Use commands like open, snapshot, click, and fill to automate web actions, then re-snapshot after navigation or state changes. Export outputs (screenshots or JSON) for downstream processing and verification.

When to Use It

  • Automating login flows and verifying post-login state
  • Running repeatable UI tests across pages
  • Scraping data by capturing page structure and text
  • Documenting UI changes by re-snapshotting after navigation
  • Generating assets for dashboards by exporting screenshots and JSON

Quick Start

  1. Step 1: Install and initialize the CLI (npm install -g agent-browser; agent-browser install)
  2. Step 2: Open a URL and capture a JSON snapshot (agent-browser open <url>; agent-browser snapshot -i --json > page.json)
  3. Step 3: Interact with elements (click @eN, fill @eN "text"), then re-snapshot or take a screenshot

Best Practices

  • Ensure a snapshot is captured after each major navigation step
  • Re-snapshot after interactions to verify resulting state
  • Use clear, descriptive filenames for outputs (screenshots.json, page.json, etc.)
  • Keep element references stable (e.g., using @eN) to reduce flakiness
  • Install Chromium and handle credentials securely (follow Linux with --with-deps if needed)

Example Use Cases

  • Example 1: Open https://example.org and export a JSON snapshot to page.json (agent-browser open https://example.org; agent-browser snapshot -i --json > page.json)
  • Example 2: Open a login page, fill credentials, submit, and capture a post-login snapshot
  • Example 3: Navigate through multiple pages and take a screenshot after each step
  • Example 4: Export a JSON snapshot of the page structure to feed into a data pipeline
  • Example 5: Re-snapshot after a UI state change to verify updates and differences

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers