What is Agent Browser Automation?

A fast Rust-based CLI for AI agents to drive headless browsers with JSON outputs and a Node.js fallback.

What outputs does it emit?

Structured JSON per action (navigate, click, type, snapshot) including fields like success, url, element, timestamp, and loadTime.

Agent Browser Automation

browser-automation headless-chrome rust ai-agents cli snapshots navigation

npx machina-cli add skill PramodDutta/qaskills/agent-browser --openclaw

Files (1)

SKILL.md

19.6 KB

Agent Browser Automation Skill

You are an expert in browser automation using agent-browser, a fast Rust-based CLI tool designed specifically for AI coding agents. When the user asks you to automate browser interactions, perform web scraping, or test web applications, follow these detailed instructions.

Core Principles

Speed-first architecture -- Rust core provides millisecond-level responsiveness for agent commands.
Structured output -- All commands return JSON output optimized for AI agent parsing.
Fallback resilience -- Automatically falls back to Node.js implementation when Rust binary is unavailable.
Agent-optimized commands -- CLI designed for programmatic control, not human interaction.
Snapshot-driven debugging -- Every action can capture page snapshots for AI agent analysis.

Installation

# Install globally via npm
npm install -g agent-browser

# Or use npx without installation
npx agent-browser --version

# Verify installation
agent-browser --help

Project Structure for Agent-Driven Testing

tests/
  browser-automation/
    scenarios/
      login-flow.json
      checkout-flow.json
      search-navigation.json
    snapshots/
      login-success.png
      cart-state.png
    scripts/
      run-scenario.sh
      batch-test.sh
  config/
    browser-config.json
    viewport-sizes.json

Core Commands

Navigation Commands

# Navigate to URL
agent-browser navigate --url "https://example.com"

# Navigate with custom viewport
agent-browser navigate --url "https://example.com" --viewport "1920x1080"

# Navigate with custom user agent
agent-browser navigate --url "https://example.com" \
  --user-agent "Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)"

# Navigate and wait for network idle
agent-browser navigate --url "https://example.com" --wait-until "networkidle"

Output:

{
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "loadTime": 234,
  "timestamp": "2024-02-12T10:30:00Z"
}

Clicking Elements

# Click by selector
agent-browser click --selector "button#submit"

# Click by text content
agent-browser click --text "Sign In"

# Click with coordinate offset
agent-browser click --selector ".dropdown" --offset "10,20"

# Wait for element before clicking
agent-browser click --selector "button.load-more" --wait 5000

Output:

{
  "success": true,
  "element": "button#submit",
  "action": "click",
  "timestamp": "2024-02-12T10:30:01Z"
}

Typing Input

# Type into input field
agent-browser type --selector "input#email" --text "user@example.com"

# Type with delay between keystrokes (human-like)
agent-browser type --selector "input#search" --text "automation" --delay 50

# Type and press Enter
agent-browser type --selector "input#search" --text "query" --enter

# Clear field before typing
agent-browser type --selector "input#username" --text "newuser" --clear

Output:

{
  "success": true,
  "element": "input#email",
  "action": "type",
  "length": 17,
  "timestamp": "2024-02-12T10:30:02Z"
}

Taking Snapshots

# Full page screenshot
agent-browser snapshot --output "screenshot.png"

# Snapshot specific element
agent-browser snapshot --selector "#content" --output "content.png"

# Snapshot with custom dimensions
agent-browser snapshot --viewport "1920x1080" --output "desktop.png"

# Get page HTML
agent-browser snapshot --format "html" --output "page.html"

# Get page as PDF
agent-browser snapshot --format "pdf" --output "document.pdf"

Output:

{
  "success": true,
  "format": "png",
  "path": "screenshot.png",
  "size": 245678,
  "dimensions": {
    "width": 1280,
    "height": 720
  },
  "timestamp": "2024-02-12T10:30:03Z"
}

Extracting Data

# Extract text from element
agent-browser extract --selector "h1.title" --attribute "text"

# Extract attribute value
agent-browser extract --selector "a.link" --attribute "href"

# Extract multiple elements
agent-browser extract --selector "li.item" --all

# Extract as JSON
agent-browser extract --selector ".product-card" --json \
  --fields "title:.title,price:.price,image:img@src"

Output:

{
  "success": true,
  "selector": "h1.title",
  "results": [
    {
      "text": "Welcome to Our Site",
      "visible": true
    }
  ],
  "count": 1,
  "timestamp": "2024-02-12T10:30:04Z"
}

Waiting for Elements

# Wait for element to appear
agent-browser wait --selector ".loading-complete" --timeout 10000

# Wait for element to disappear
agent-browser wait --selector ".spinner" --hidden --timeout 5000

# Wait for text to appear
agent-browser wait --text "Success" --timeout 3000

# Wait for custom condition
agent-browser wait --script "document.readyState === 'complete'"

Output:

{
  "success": true,
  "condition": "element_visible",
  "selector": ".loading-complete",
  "waited": 1234,
  "timestamp": "2024-02-12T10:30:05Z"
}

Scenario-Based Automation

Login Flow Example

#!/bin/bash
# login-test.sh

# Navigate to login page
agent-browser navigate --url "https://app.example.com/login"

# Fill email
agent-browser type --selector "input#email" --text "user@example.com"

# Fill password
agent-browser type --selector "input#password" --text "$PASSWORD"

# Click submit
agent-browser click --selector "button[type='submit']"

# Wait for dashboard
agent-browser wait --selector ".dashboard" --timeout 5000

# Take success snapshot
agent-browser snapshot --output "login-success.png"

# Extract user info
agent-browser extract --selector ".user-name" --attribute "text"

E2E Shopping Flow

#!/bin/bash
# checkout-flow.sh

set -e  # Exit on any error

echo "Starting checkout flow test..."

# 1. Navigate to product page
agent-browser navigate --url "https://shop.example.com/products/widget-123"
agent-browser wait --selector ".product-details" --timeout 5000

# 2. Add to cart
agent-browser click --selector "button.add-to-cart"
agent-browser wait --text "Added to cart" --timeout 3000

# 3. Open cart
agent-browser click --selector ".cart-icon"
agent-browser wait --selector ".cart-items" --timeout 2000

# 4. Take cart snapshot
agent-browser snapshot --selector ".cart-summary" --output "cart-state.png"

# 5. Proceed to checkout
agent-browser click --text "Proceed to Checkout"
agent-browser wait --selector ".checkout-form" --timeout 5000

# 6. Fill shipping info
agent-browser type --selector "#shipping-name" --text "John Doe"
agent-browser type --selector "#shipping-address" --text "123 Main St"
agent-browser type --selector "#shipping-city" --text "San Francisco"

# 7. Take final snapshot
agent-browser snapshot --output "checkout-complete.png"

echo "Checkout flow completed successfully"

Integration with AI Agent Workflows

TypeScript/JavaScript Integration

import { execSync } from 'child_process';

interface BrowserResult {
  success: boolean;
  [key: string]: any;
}

class AgentBrowser {
  private baseCommand = 'agent-browser';

  async navigate(url: string, options: { viewport?: string; waitUntil?: string } = {}): Promise<BrowserResult> {
    const args = [`navigate`, `--url`, `"${url}"`];
    if (options.viewport) args.push(`--viewport`, options.viewport);
    if (options.waitUntil) args.push(`--wait-until`, options.waitUntil);

    return this.exec(args);
  }

  async click(selector: string, options: { wait?: number; offset?: string } = {}): Promise<BrowserResult> {
    const args = [`click`, `--selector`, `"${selector}"`];
    if (options.wait) args.push(`--wait`, options.wait.toString());
    if (options.offset) args.push(`--offset`, options.offset);

    return this.exec(args);
  }

  async type(selector: string, text: string, options: { delay?: number; enter?: boolean; clear?: boolean } = {}): Promise<BrowserResult> {
    const args = [`type`, `--selector`, `"${selector}"`, `--text`, `"${text}"`];
    if (options.delay) args.push(`--delay`, options.delay.toString());
    if (options.enter) args.push(`--enter`);
    if (options.clear) args.push(`--clear`);

    return this.exec(args);
  }

  async snapshot(output: string, options: { format?: string; selector?: string; viewport?: string } = {}): Promise<BrowserResult> {
    const args = [`snapshot`, `--output`, `"${output}"`];
    if (options.format) args.push(`--format`, options.format);
    if (options.selector) args.push(`--selector`, `"${options.selector}"`);
    if (options.viewport) args.push(`--viewport`, options.viewport);

    return this.exec(args);
  }

  async extract(selector: string, options: { attribute?: string; all?: boolean; json?: boolean } = {}): Promise<BrowserResult> {
    const args = [`extract`, `--selector`, `"${selector}"`];
    if (options.attribute) args.push(`--attribute`, options.attribute);
    if (options.all) args.push(`--all`);
    if (options.json) args.push(`--json`);

    return this.exec(args);
  }

  private exec(args: string[]): BrowserResult {
    const command = `${this.baseCommand} ${args.join(' ')}`;
    try {
      const output = execSync(command, { encoding: 'utf-8' });
      return JSON.parse(output);
    } catch (error: any) {
      return {
        success: false,
        error: error.message,
        command,
      };
    }
  }
}

// Usage example
async function testLoginFlow() {
  const browser = new AgentBrowser();

  // Navigate
  const navResult = await browser.navigate('https://app.example.com/login', {
    waitUntil: 'networkidle',
  });

  if (!navResult.success) {
    throw new Error(`Navigation failed: ${navResult.error}`);
  }

  // Login
  await browser.type('#email', 'user@example.com');
  await browser.type('#password', process.env.PASSWORD || '', { enter: true });

  // Wait for dashboard
  await browser.wait('.dashboard', { timeout: 5000 });

  // Capture success state
  await browser.snapshot('login-success.png');

  console.log('Login test completed successfully');
}

Python Integration

import json
import subprocess
from typing import Dict, Any, Optional

class AgentBrowser:
    def __init__(self, browser_path: str = "agent-browser"):
        self.browser_path = browser_path

    def navigate(self, url: str, viewport: Optional[str] = None, wait_until: Optional[str] = None) -> Dict[str, Any]:
        args = [self.browser_path, "navigate", "--url", url]
        if viewport:
            args.extend(["--viewport", viewport])
        if wait_until:
            args.extend(["--wait-until", wait_until])
        return self._exec(args)

    def click(self, selector: str, wait: Optional[int] = None, offset: Optional[str] = None) -> Dict[str, Any]:
        args = [self.browser_path, "click", "--selector", selector]
        if wait:
            args.extend(["--wait", str(wait)])
        if offset:
            args.extend(["--offset", offset])
        return self._exec(args)

    def type(self, selector: str, text: str, delay: Optional[int] = None, enter: bool = False, clear: bool = False) -> Dict[str, Any]:
        args = [self.browser_path, "type", "--selector", selector, "--text", text]
        if delay:
            args.extend(["--delay", str(delay)])
        if enter:
            args.append("--enter")
        if clear:
            args.append("--clear")
        return self._exec(args)

    def snapshot(self, output: str, format: Optional[str] = None, selector: Optional[str] = None) -> Dict[str, Any]:
        args = [self.browser_path, "snapshot", "--output", output]
        if format:
            args.extend(["--format", format])
        if selector:
            args.extend(["--selector", selector])
        return self._exec(args)

    def extract(self, selector: str, attribute: Optional[str] = None, all: bool = False) -> Dict[str, Any]:
        args = [self.browser_path, "extract", "--selector", selector]
        if attribute:
            args.extend(["--attribute", attribute])
        if all:
            args.append("--all")
        return self._exec(args)

    def _exec(self, args: list) -> Dict[str, Any]:
        try:
            result = subprocess.run(args, capture_output=True, text=True, check=True)
            return json.loads(result.stdout)
        except subprocess.CalledProcessError as e:
            return {
                "success": False,
                "error": e.stderr,
                "command": " ".join(args)
            }

# Usage example
def test_search_flow():
    browser = AgentBrowser()

    # Navigate to search page
    nav = browser.navigate("https://example.com/search")
    assert nav["success"], f"Navigation failed: {nav.get('error')}"

    # Enter search query
    browser.type("input#search", "automation testing", enter=True)

    # Wait for results
    browser.wait(".search-results", timeout=5000)

    # Extract result titles
    results = browser.extract(".result-item h3", attribute="text", all=True)

    print(f"Found {len(results.get('results', []))} search results")

    # Take snapshot
    browser.snapshot("search-results.png")

Configuration Management

Browser Config File

{
  "viewport": {
    "width": 1920,
    "height": 1080
  },
  "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
  "timeout": {
    "navigation": 30000,
    "element": 10000,
    "network": 5000
  },
  "screenshot": {
    "format": "png",
    "quality": 80,
    "fullPage": true
  },
  "headless": true,
  "slowMo": 0,
  "devtools": false
}

Load Config in Commands

# Use config file
agent-browser --config ./browser-config.json navigate --url "https://example.com"

# Override config values
agent-browser --config ./browser-config.json --viewport "1280x720" navigate --url "https://example.com"

Performance Optimization

Rust vs Node.js Performance

Rust Implementation Benefits:

10-50x faster command execution
Lower memory footprint (20-30 MB vs 100+ MB for Node.js)
Instant startup time (<100ms vs 500-1000ms)
Compiled binary distribution (no runtime dependencies)

When Node.js Fallback Activates:

Rust binary not available for platform
Specific browser features requiring Node.js Puppeteer/Playwright
Debug mode requiring Chrome DevTools Protocol

Batch Command Optimization

# Chain commands for efficiency
agent-browser chain \
  "navigate --url https://example.com" \
  "type --selector #search --text query" \
  "click --selector button" \
  "wait --selector .results" \
  "snapshot --output results.png"

# Output includes timing for each step

Best Practices

Use JSON output parsing -- Always parse JSON responses for reliable AI agent integration.
Implement retry logic -- Network issues and timing can cause transient failures.
Take snapshots liberally -- Visual snapshots help AI agents understand page state.
Use semantic selectors -- Prefer IDs and stable data attributes over brittle CSS selectors.
Set appropriate timeouts -- Balance speed with reliability based on page complexity.
Chain related commands -- Use command chaining to reduce overhead.
Cache browser instances -- Reuse browser contexts for faster subsequent commands.
Validate responses -- Check success field before proceeding with workflow.
Log all actions -- Maintain audit trail for debugging and compliance.
Use structured scenarios -- Define reusable JSON scenarios for common flows.

Anti-Patterns to Avoid

Hardcoded waits -- Avoid fixed delays; use element waiting instead.
Ignoring errors -- Always check success field and handle failures gracefully.
Over-reliance on XPath -- Prefer CSS selectors for better performance.
Snapshot overload -- Don't snapshot every step; focus on key states.
Single-use scripts -- Create reusable functions instead of one-off scripts.
No error recovery -- Implement retry and fallback mechanisms.
Unvalidated input -- Sanitize and validate all user input before typing.
Resource leaks -- Always close browser instances when done.
Ignoring performance -- Monitor command execution times and optimize.
Poor selector strategies -- Avoid deep nesting and position-dependent selectors.

Common Scenarios

Form Automation

# Registration form
agent-browser navigate --url "https://app.example.com/register"
agent-browser type --selector "#firstName" --text "John"
agent-browser type --selector "#lastName" --text "Doe"
agent-browser type --selector "#email" --text "john@example.com"
agent-browser type --selector "#password" --text "SecurePass123!"
agent-browser click --selector "#terms-checkbox"
agent-browser click --selector "button[type='submit']"
agent-browser wait --text "Registration successful" --timeout 5000
agent-browser snapshot --output "registration-success.png"

Data Extraction

# Extract product information
agent-browser navigate --url "https://shop.example.com/products"
agent-browser wait --selector ".product-grid" --timeout 5000
agent-browser extract --selector ".product-card" --json \
  --fields "name:.product-name,price:.product-price,rating:.rating" \
  --all > products.json

Multi-Step Workflow

# Complex dashboard interaction
agent-browser navigate --url "https://dashboard.example.com"
agent-browser wait --selector ".dashboard-loaded" --timeout 10000
agent-browser click --selector "a[href='/analytics']"
agent-browser wait --selector ".chart-container" --timeout 5000
agent-browser click --selector "button.date-filter"
agent-browser click --text "Last 30 days"
agent-browser wait --selector ".chart-updated" --timeout 3000
agent-browser snapshot --selector ".analytics-panel" --output "analytics.png"

Debugging and Troubleshooting

Enable Debug Mode

# Verbose output
agent-browser --debug navigate --url "https://example.com"

# Show browser window (non-headless)
agent-browser --headless false navigate --url "https://example.com"

# Save DevTools logs
agent-browser --console-log ./console.log navigate --url "https://example.com"

Error Handling

async function robustBrowserAction(action: () => Promise<BrowserResult>): Promise<BrowserResult> {
  const maxRetries = 3;
  let lastError: any;

  for (let i = 0; i < maxRetries; i++) {
    try {
      const result = await action();
      if (result.success) {
        return result;
      }
      lastError = result.error;
    } catch (error) {
      lastError = error;
    }

    // Exponential backoff
    await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
  }

  throw new Error(`Action failed after ${maxRetries} retries: ${lastError}`);
}

Integration with AI Agents

When AI agents use agent-browser, they should:

Parse JSON responses to understand command results
Analyze snapshots to verify visual state
Adapt selectors based on page structure
Handle errors gracefully with retry logic
Chain commands efficiently to minimize overhead
Validate outcomes by checking both JSON and visual snapshots
Maintain context across multi-step flows
Learn from failures by analyzing error messages and snapshots

This tool empowers AI agents to interact with web applications at high speed with structured, parseable output optimized for autonomous decision-making.

Source

git clone https://github.com/PramodDutta/qaskills/blob/main/seed-skills/agent-browser/SKILL.mdView on GitHub

Overview

Agent Browser Automation is a fast Rust-based CLI designed for AI coding agents to drive headless browsers. It provides navigation, clicking, typing, and snapshots with structured JSON outputs optimized for agent parsing, and falls back to Node.js when Rust is unavailable.

How This Skill Works

The Rust core delivers millisecond-scale responsiveness and emits structured JSON for every action (navigate, click, type, snapshot). When the Rust binary isn't available, agent-browser automatically falls back to a Node.js implementation to keep workflows running. Snapshots are available for debugging and agent analysis at each step.

When to Use It

Automate agent-driven login and checkout flows in web apps
Run end-to-end UI tests for AI agent workflows with consistent JSON outputs
Perform navigation, clicking, and typing via programmatic agent commands
Capture page snapshots at key steps to validate UI states
Validate page load performance and network state using wait-until options

Quick Start

Step 1: Install the CLI: npm install -g agent-browser
Step 2: Navigate to a page: agent-browser navigate --url "https://example.com"
Step 3: Interact and snapshot: agent-browser type --selector "#q" --text "test" && agent-browser snapshot --output "page.png"

Best Practices

Prefer explicit selectors or text-based queries to avoid flaky selectors
Use navigate with wait-until (e.g., networkidle) to ensure pages are ready
Capture snapshots regularly to aid debugging and traceability
Configure deterministic viewport and user-agent to reduce variability
Treat CLI actions as idempotent and log outputs for reliable agent parsing

Example Use Cases

Autonomous login flow: navigate → type credentials → click login → snapshot
Search workflow: navigate to site, type query, click results, snapshot
Checkout automation: fill address/payment, submit, verify success snapshot
Content scraping: navigate, collect selectors, log JSON payloads
UI debugging: reproduce flaky behavior with repeated navigation and snapshots

Frequently Asked Questions

Add this skill to your agents

Related Skills

Amazon Listing Audit Pro

openclaw/skills

8-dimension listing health check scoring title, bullets, images, A+ content, keywords, pricing, reviews, and competitive positioning. Generates priority fix list and keyword gap analysis. Supports single ASIN or bulk audit. Requires APICLAW_API_KEY.

-21risk-automation

ranbot-ai/awesome-skills

Automate 21risk tasks via Rube MCP (Composio). Always search tools first for current schemas.

-2chat-automation

ranbot-ai/awesome-skills

Automate 2chat tasks via Rube MCP (Composio). Always search tools first for current schemas.

homebutler

openclaw/skills

Manage and monitor homelab servers and Docker using a single binary CLI or AI tools. No dependencies, cross-platform, supports SSH, Wake-on-LAN, Docker management, port scanning, network discovery, and resource alerts.

knowledge-locator

athola/claude-night-market

'Consult this skill when searching or navigating stored knowledge. Use

azure-cli

chaterm/terminal-skills

Azure CLI 操作

Agent Browser Automation

Agent Browser Automation Skill

Core Principles

Installation

Project Structure for Agent-Driven Testing

Core Commands

Navigation Commands

Clicking Elements

Typing Input

Taking Snapshots

Extracting Data

Waiting for Elements

Scenario-Based Automation

Login Flow Example

E2E Shopping Flow

Integration with AI Agent Workflows

TypeScript/JavaScript Integration

Python Integration

Configuration Management

Browser Config File

Load Config in Commands

Performance Optimization

Rust vs Node.js Performance

Batch Command Optimization

Best Practices

Anti-Patterns to Avoid

Common Scenarios

Form Automation

Data Extraction

Multi-Step Workflow

Debugging and Troubleshooting

Enable Debug Mode

Error Handling

Integration with AI Agents

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What is Agent Browser Automation?

What outputs does it emit?

Does it require Rust to run?

Related Skills

Amazon Listing Audit Pro

-21risk-automation

-2chat-automation

homebutler

knowledge-locator

azure-cli