Get the FREE Ultimate OpenClaw Setup Guide →

agent-browser

npx machina-cli add skill qwibitai/nanoclaw/agent-browser --openclaw
Files (1)
SKILL.md
4.7 KB

Browser Automation with agent-browser

Quick start

agent-browser open <url>        # Navigate to page
agent-browser snapshot -i       # Get interactive elements with refs
agent-browser click @e1         # Click element by ref
agent-browser fill @e2 "text"   # Fill input by ref
agent-browser close             # Close browser

Core workflow

  1. Navigate: agent-browser open <url>
  2. Snapshot: agent-browser snapshot -i (returns elements with refs like @e1, @e2)
  3. Interact using refs from the snapshot
  4. Re-snapshot after navigation or significant DOM changes

Commands

Navigation

agent-browser open <url>      # Navigate to URL
agent-browser back            # Go back
agent-browser forward         # Go forward
agent-browser reload          # Reload page
agent-browser close           # Close browser

Snapshot (page analysis)

agent-browser snapshot            # Full accessibility tree
agent-browser snapshot -i         # Interactive elements only (recommended)
agent-browser snapshot -c         # Compact output
agent-browser snapshot -d 3       # Limit depth to 3
agent-browser snapshot -s "#main" # Scope to CSS selector

Interactions (use @refs from snapshot)

agent-browser click @e1           # Click
agent-browser dblclick @e1        # Double-click
agent-browser fill @e2 "text"     # Clear and type
agent-browser type @e2 "text"     # Type without clearing
agent-browser press Enter         # Press key
agent-browser hover @e1           # Hover
agent-browser check @e1           # Check checkbox
agent-browser uncheck @e1         # Uncheck checkbox
agent-browser select @e1 "value"  # Select dropdown option
agent-browser scroll down 500     # Scroll page
agent-browser upload @e1 file.pdf # Upload files

Get information

agent-browser get text @e1        # Get element text
agent-browser get html @e1        # Get innerHTML
agent-browser get value @e1       # Get input value
agent-browser get attr @e1 href   # Get attribute
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
agent-browser get count ".item"   # Count matching elements

Screenshots & PDF

agent-browser screenshot          # Save to temp directory
agent-browser screenshot path.png # Save to specific path
agent-browser screenshot --full   # Full page
agent-browser pdf output.pdf      # Save as PDF

Wait

agent-browser wait @e1                     # Wait for element
agent-browser wait 2000                    # Wait milliseconds
agent-browser wait --text "Success"        # Wait for text
agent-browser wait --url "**/dashboard"    # Wait for URL pattern
agent-browser wait --load networkidle      # Wait for network idle

Semantic locators (alternative to refs)

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" type "query"

Authentication with saved state

# Login once
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Later: load saved state
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard

Cookies & Storage

agent-browser cookies                     # Get all cookies
agent-browser cookies set name value      # Set cookie
agent-browser cookies clear               # Clear cookies
agent-browser storage local               # Get localStorage
agent-browser storage local set k v       # Set value

JavaScript

agent-browser eval "document.title"   # Run JavaScript

Example: Form submission

agent-browser open https://example.com/form
agent-browser snapshot -i
# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result

Example: Data extraction

agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e1  # Get product title
agent-browser get attr @e2 href  # Get link URL
agent-browser screenshot products.png

Source

git clone https://github.com/qwibitai/nanoclaw/blob/main/container/skills/agent-browser/SKILL.mdView on GitHub

Overview

agent-browser is a browser automation tool for researching topics, interacting with web apps, filling forms, taking screenshots, extracting data, and testing pages. It leverages commands like open, snapshot, click, fill, and more to perform tasks without manual UI operations. This makes web tasks repeatable, auditable, and faster.

How This Skill Works

Start by opening a URL with agent-browser open. Use snapshot (especially snapshot -i) to collect element refs (like @e1, @e2), then interact with those refs via click, fill, type, and other actions. Re-snapshot after navigation or major DOM changes, and you can navigate, wait for conditions, capture screenshots or PDFs, and close when finished.

When to Use It

  • Research topics across multiple pages and pull data automatically
  • Fill out forms or login flows and validate results
  • Interact with dynamic web apps and test UI behavior
  • Capture screenshots or PDFs for audits and reporting
  • Scrape content while ensuring page state via waits and checks

Quick Start

  1. Step 1: agent-browser open <url>
  2. Step 2: agent-browser snapshot -i
  3. Step 3: Interact using refs from the snapshot (click, fill, etc.)

Best Practices

  • Open the target page and run a snapshot -i to capture stable refs early
  • Prefer using interactive element refs (@e1, @e2) for reliable actions
  • Incorporate wait commands (wait for text, URL, or load state) before actions
  • Use semantic find as a fallback to element refs when refs fail
  • Close the browser after tasks or save state for reuse in authentication flows

Example Use Cases

  • Open a product page, snapshot, click to add to cart, take a screenshot, and extract price data
  • Login to a dashboard, navigate to Reports, wait for load, and save a PDF of the report
  • Fill a contact form on a marketing site and verify the submission message
  • Extract titles and URLs from a blog index and export to CSV
  • Test a multi-step checkout flow, waiting for network idle and final confirmation

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers