What is ref-based element selection?

Elements are identified by refs like @e1 that come from accessibility snapshots, making interactions more stable than brittle selectors.

How do I get machine-readable results?

Use snapshot -i --json to output a JSON representation of the interactive elements and their refs.

Which commands cover interactions?

Common commands include click, fill, type, hover, check, uncheck, select, and scroll, all targeting refs like @e1.

agent-browser

npx machina-cli add skill jikig-ai/soleur/agent-browser --openclaw

Files (1)

SKILL.md

6.2 KB

agent-browser: CLI Browser Automation

Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.

Setup Check

# Check installation
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED - run: npm install -g agent-browser && agent-browser install"

Install if needed

npm install -g agent-browser
agent-browser install  # Downloads Chromium

Core Workflow

The snapshot + ref pattern is optimal for LLMs:

Navigate to URL
Snapshot to get interactive elements with refs
Interact using refs (@e1, @e2, etc.)
Re-snapshot after navigation or DOM changes

# Step 1: Open URL
agent-browser open https://example.com

# Step 2: Get interactive elements with refs
agent-browser snapshot -i --json

# Step 3: Interact using refs
agent-browser click @e1
agent-browser fill @e2 "search query"

# Step 4: Re-snapshot after changes
agent-browser snapshot -i

Key Commands

Navigation

agent-browser open <url>       # Navigate to URL
agent-browser back             # Go back
agent-browser forward          # Go forward
agent-browser reload           # Reload page
agent-browser close            # Close browser

Snapshots (Essential for AI)

agent-browser snapshot              # Full accessibility tree
agent-browser snapshot -i           # Interactive elements only (recommended)
agent-browser snapshot -i --json    # JSON output for parsing
agent-browser snapshot -c           # Compact (remove empty elements)
agent-browser snapshot -d 3         # Limit depth

Interactions

agent-browser click @e1                    # Click element
agent-browser dblclick @e1                 # Double-click
agent-browser fill @e1 "text"              # Clear and fill input
agent-browser type @e1 "text"              # Type without clearing
agent-browser press Enter                  # Press key
agent-browser hover @e1                    # Hover element
agent-browser check @e1                    # Check checkbox
agent-browser uncheck @e1                  # Uncheck checkbox
agent-browser select @e1 "option"          # Select dropdown option
agent-browser scroll down 500              # Scroll (up/down/left/right)
agent-browser scrollintoview @e1           # Scroll element into view

Get Information

agent-browser get text @e1          # Get element text
agent-browser get html @e1          # Get element HTML
agent-browser get value @e1         # Get input value
agent-browser get attr href @e1     # Get attribute
agent-browser get title             # Get page title
agent-browser get url               # Get current URL
agent-browser get count "button"    # Count matching elements

Screenshots & PDFs

agent-browser screenshot                      # Viewport screenshot
agent-browser screenshot --full               # Full page
agent-browser screenshot output.png           # Save to file
agent-browser screenshot --full output.png    # Full page to file
agent-browser pdf output.pdf                  # Save as PDF

Wait

agent-browser wait @e1              # Wait for element
agent-browser wait 2000             # Wait milliseconds
agent-browser wait "text"           # Wait for text to appear

Semantic Locators (Alternative to Refs)

agent-browser find role button click --name "Submit"
agent-browser find text "Sign up" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." fill "query"

Sessions (Parallel Browsers)

# Run multiple independent browser sessions
agent-browser --session browser1 open https://site1.com
agent-browser --session browser2 open https://site2.com

# List active sessions
agent-browser session list

Examples

Login Flow

agent-browser open https://app.example.com/login
agent-browser snapshot -i
# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i  # Verify logged in

Search and Extract

agent-browser open https://news.ycombinator.com
agent-browser snapshot -i --json
# Parse JSON to find story links
agent-browser get text @e12  # Get headline text
agent-browser click @e12     # Click to open story

Form Filling

agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4  # Agree to terms
agent-browser click @e5  # Submit button
agent-browser screenshot confirmation.png

Debug Mode

# Run with visible browser window
agent-browser --headed open https://example.com
agent-browser --headed snapshot -i
agent-browser --headed click @e1

JSON Output

Add --json for structured output:

agent-browser snapshot -i --json

Returns:

{
  "success": true,
  "data": {
    "refs": {
      "e1": {"name": "Submit", "role": "button"},
      "e2": {"name": "Email", "role": "textbox"}
    },
    "snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
  }
}

vs Playwright MCP

Feature	agent-browser (CLI)	Playwright MCP
Interface	Bash commands	MCP tools
Selection	Refs (@e1)	Refs (e1)
Output	Text/JSON	Tool responses
Parallel	Sessions	Tabs
Best for	Quick automation	Tool integration

Use agent-browser when:

You prefer Bash-based workflows
You want simpler CLI commands
You need quick one-off automation

Use Playwright MCP when:

You need deep MCP tool integration
You want tool-based responses
You're building complex automation

Source

git clone https://github.com/jikig-ai/soleur/blob/main/plugins/soleur/skills/agent-browser/SKILL.mdView on GitHub

Overview

agent-browser is a Vercel CLI tool for automating headless browser tasks via Bash. It uses ref-based element selection (e.g., @e1, @e2) from accessibility snapshots to navigate pages, fill forms, click buttons, take screenshots, and scrape data. This makes AI agents reliable when interacting with dynamic web apps.

How This Skill Works

Open a URL, take an accessibility snapshot to obtain element refs, perform actions using those refs (click, fill, type, etc.), and re-snapshot after DOM changes. The workflow emphasizes stable, ref-driven interactions and JSON-friendly output for parsing.

When to Use It

Automate product searches and form submissions on e-commerce sites
Fill lead-gen or login forms and navigate post-submit flows
Click buttons and drive multi-step checkouts or wizards
Capture screenshots or generate PDFs for QA, docs, or audits
Scrape page data (titles, URLs, attributes) for monitoring or indexing

Quick Start

Step 1: Open the target URL with agent-browser open <url>
Step 2: Get interactive refs by running agent-browser snapshot -i --json
Step 3: Interact with elements using refs (e.g., agent-browser click @e1) and re-snapshot as needed

Best Practices

Always run a snapshot after navigation or DOM updates to refresh refs
Use agent-browser snapshot -i --json when you need machine-readable output
Interact via refs (@e1, @e2) rather than brittle selectors
Incorporate waits (wait commands) to handle dynamic content before actions
Validate results with a final get/text dump or screenshot to confirm success

Example Use Cases

Open https://example.com, snapshot, click 'Sign in', fill credentials, and snapshot the post-login page
Search for 'AI toys', click the first result, and scrape the product title and price
Fill a contact form and submit, then verify success text on the confirmation page
Navigate to a pricing page, switch tabs, and save a full-page PDF for documentation
Login to a dashboard, wait for the user avatar, and scrape the displayed user name

Frequently Asked Questions

Add this skill to your agents