How do I install agent-browser and get started?

Install with npm install -g agent-browser and run agent-browser install to download Chromium and set up the CLI.

What outputs can I export with agent-browser?

You can export PNG screenshots and JSON snapshots of the page structure, as well as extracted text/attributes for processing.

How do I interact with page elements?

Use commands like click @eN to click elements and fill @eN "text" to input values, then snapshot again to verify changes.

agent-browser

Scanned

npx machina-cli add skill fmschulz/omics-skills/agent-browser --openclaw

Files (1)

SKILL.md

1.6 KB

Agent Browser

Automate browser interactions through the agent-browser CLI for repeatable, scriptable web tasks.

Instructions

Install and initialize the CLI.
Open the target URL and capture a snapshot.
Interact with elements using snapshot references.
Re-snapshot after navigation or state changes.
Export results (screenshots or JSON) for downstream use.

Quick Reference

Task	Action
Install	`npm install -g agent-browser` then `agent-browser install`
Open page	`agent-browser open <url>`
Snapshot	`agent-browser snapshot -i --json`
Interact	`click @eN`, `fill @eN "text"`
Screenshot	`agent-browser screenshot output.png`
Docs	See `references/quick-start.md`

Input Requirements

Target URL(s)
CLI installed and Chromium downloaded
Credentials if login is required

Output

Screenshots (PNG)
JSON snapshots of page structure
Extracted text/attributes

Quality Gates

Snapshot captured after each major navigation step
Interactions verified in a follow-up snapshot
Outputs saved to disk with clear filenames

Examples

Example 1: Capture a page snapshot

agent-browser open https://example.org
agent-browser snapshot -i --json > page.json

Troubleshooting

Issue: Chromium not installed Solution: Run agent-browser install (add --with-deps on Linux).

Issue: Element not found Solution: Re-snapshot and confirm the correct element reference.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/agent-browser/SKILL.mdView on GitHub

Overview

agent-browser enables repeatable, scriptable browser tasks via a CLI. It supports opening target URLs, interacting with elements, taking snapshots, and exporting results as screenshots or JSON for downstream use, making UI testing and scraping more reliable.

How This Skill Works

Install the CLI and ensure Chromium is downloaded. Use commands like open, snapshot, click, and fill to automate web actions, then re-snapshot after navigation or state changes. Export outputs (screenshots or JSON) for downstream processing and verification.

When to Use It

Automating login flows and verifying post-login state
Running repeatable UI tests across pages
Scraping data by capturing page structure and text
Documenting UI changes by re-snapshotting after navigation
Generating assets for dashboards by exporting screenshots and JSON

Quick Start

Step 1: Install and initialize the CLI (npm install -g agent-browser; agent-browser install)
Step 2: Open a URL and capture a JSON snapshot (agent-browser open <url>; agent-browser snapshot -i --json > page.json)
Step 3: Interact with elements (click @eN, fill @eN "text"), then re-snapshot or take a screenshot

Best Practices

Ensure a snapshot is captured after each major navigation step
Re-snapshot after interactions to verify resulting state
Use clear, descriptive filenames for outputs (screenshots.json, page.json, etc.)
Keep element references stable (e.g., using @eN) to reduce flakiness
Install Chromium and handle credentials securely (follow Linux with --with-deps if needed)

Example Use Cases

Example 1: Open https://example.org and export a JSON snapshot to page.json (agent-browser open https://example.org; agent-browser snapshot -i --json > page.json)
Example 2: Open a login page, fill credentials, submit, and capture a post-login snapshot
Example 3: Navigate through multiple pages and take a screenshot after each step
Example 4: Export a JSON snapshot of the page structure to feed into a data pipeline
Example 5: Re-snapshot after a UI state change to verify updates and differences

Frequently Asked Questions

Add this skill to your agents