What is agent-browser?

A CLI tool that enables AI agents to automate and interact with websites through navigation, forms, clicks, data extraction, and testing.

How do I get started with element refs?

Open a page and run snapshot -i to collect element refs like @e1, @e2, which you can then target with click, fill, or get text.

Can I reuse login state across sessions?

Yes. Save authentication state with state save auth.json and load it later with state load to reuse your login in new sessions.

agent-browser

npx machina-cli add skill next-open-ai/openclawx/agent-browser --openclaw

Files (1)

SKILL.md

7.0 KB

Browser Automation with agent-browser

Core Workflow

Every browser automation follows this pattern:

Navigate: agent-browser open <url>
Snapshot: agent-browser snapshot -i (get element refs like @e1, @e2)
Interact: Use refs to click, fill, select
Re-snapshot: After navigation or DOM changes, get fresh refs

agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result

Essential Commands

# Navigation
agent-browser open <url>              # Navigate (aliases: goto, navigate)
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click element
agent-browser fill @e2 "text"         # Clear and type text
agent-browser type @e2 "text"         # Type without clearing
agent-browser select @e1 "option"     # Select dropdown option
agent-browser check @e1               # Check checkbox
agent-browser press Enter             # Press key
agent-browser scroll down 500         # Scroll page

# Get information
agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

# Wait
agent-browser wait @e1                # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page"    # Wait for URL pattern
agent-browser wait 2000               # Wait milliseconds

# Capture
agent-browser screenshot              # Screenshot to temp dir
agent-browser screenshot --full       # Full page screenshot
agent-browser pdf output.pdf          # Save as PDF

Common Patterns

Form Submission

agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle

Authentication with State Persistence

# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard

Data Extraction

agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5           # Get specific element text
agent-browser get text body > page.txt  # Get all page text

# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json

Parallel Sessions

agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com

agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i

agent-browser session list

Visual Browser (Debugging)

agent-browser --headed open https://example.com
agent-browser highlight @e1          # Highlight element
agent-browser record start demo.webm # Record session

iOS Simulator (Mobile Safari)

# List available iOS simulators
agent-browser device list

# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1          # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up         # Mobile-specific gesture

# Take screenshot
agent-browser -p ios screenshot mobile.png

# Close session (shuts down simulator)
agent-browser -p ios close

Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)

Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.

Ref Lifecycle (Important)

Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:

Clicking links or buttons that navigate
Form submissions
Dynamic content loading (dropdowns, modals)

agent-browser click @e5              # Navigates to new page
agent-browser snapshot -i            # MUST re-snapshot
agent-browser click @e1              # Use new refs

Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click

Deep-Dive Documentation

Reference	When to Use
references/commands.md	Full command reference with all options
references/snapshot-refs.md	Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md	Parallel sessions, state persistence, concurrent scraping
references/authentication.md	Login flows, OAuth, 2FA handling, state reuse
references/video-recording.md	Recording workflows for debugging and documentation
references/proxy-support.md	Proxy configuration, geo-testing, rotating proxies

Ready-to-Use Templates

Template	Description
templates/form-automation.sh	Form filling with validation
templates/authenticated-session.sh	Login once, reuse state
templates/capture-workflow.sh	Content extraction with screenshots

./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output

Source

git clone https://github.com/next-open-ai/openclawx/blob/main/skills/agent-browser/SKILL.mdView on GitHub

Overview

agent-browser is a command-line tool that lets AI agents interact with websites. It supports navigating pages, filling forms, clicking elements, taking screenshots, extracting data, and testing web apps. It maintains element refs and can reuse sessions across runs to automate complex browser tasks.

How This Skill Works

Start by opening a URL with agent-browser open, then snapshot -i to capture element references like @e1 and @e2. Use these refs to click, fill, or select elements, and re-snapshot after navigation or DOM changes to refresh refs. You can also wait for conditions and capture outputs such as text, URLs, or screenshots.

When to Use It

Automate filling and submitting web forms on a site
Extract structured data from product or listing pages
Log in to apps and reuse authentication state in future runs
Test end-to-end flows of a web application
Run multiple browser sessions in parallel across sites

Quick Start

Step 1: agent-browser open https://example.com
Step 2: agent-browser snapshot -i
Step 3: agent-browser fill @e1 "your input"; agent-browser click @e2; agent-browser wait --load networkidle

Best Practices

Snapshot after every navigation or major DOM change to refresh element refs
Use snapshot -i to reliably capture unique element references
Persist authentication state with state save and load for multi-session work
Leverage waits (networkidle, URL patterns) to ensure pages finish loading before actions
Isolate sessions when automating multiple sites to avoid cross-site data leakage

Example Use Cases

Form submission flow: open signup page, fill name and email, select country, check terms, and submit
Data extraction: open products page, snapshot elements, read text from product titles, and output JSON
Authentication: login, wait for dashboard URL, save auth state, then reuse in a new session
Parallel sessions: run two sites concurrently with separate sessions to collect data
End-to-end test: navigate to a page, perform actions, validate results, and capture a screenshot

Frequently Asked Questions

Add this skill to your agents