How do I get started with agent-browser?

Install the CLI (npm install -g agent-browser), install Chromium via agent-browser install, then run open with a URL and snapshot to get element refs.

What does deterministic element selection mean here?

Elements are referenced by stable IDs from the accessibility tree snapshot (e.g., @e1, @e2). These refs remain valid across interactions until the page changes.

What should I do if elements change or refs become stale?

Rerun agent-browser open and snapshot to refresh the element references, and update your automation script accordingly.

browser-automation-agent

Flagged

{"isSafe":false,"isSuspicious":true,"riskLevel":"medium","findings":[{"category":"shell_command","severity":"medium","description":"Dynamic shell invocation using execSync with string interpolation (execSync(`agent-browser ${cmd}`, { encoding: 'utf-8' }));) creates a potential command-injection risk if 'cmd' is derived from untrusted input. If an attacker supplies a crafted 'cmd', they could execute arbitrary commands in the host environment through the shell.","evidence":"const { execSync } = require('child_process');\nfunction browserCommand(cmd) {\n return execSync(`agent-browser ${cmd}`, { encoding: 'utf-8' });\n}"}],"summary":"The skill content is largely safe for browser automation, with no malicious commands or data exfiltration patterns evident. A potential security concern is the dynamic shell invocation pattern used in the Node.js wrapper (execSync) that could allow command injection if 'cmd' comes from untrusted input. Mitigations: avoid passing untrusted input to shell commands, prefer parameterized interfaces or white-list allowed commands, and consider using spawn with shell: false or a restricted command executor."}

npx machina-cli add skill besoeasy/open-skills/browser-automation-agent --openclaw

Files (1)

SKILL.md

8.3 KB

Browser Automation with Agent-Browser

Agent-browser is a headless browser automation CLI designed specifically for AI agents. It provides fast browser control with deterministic element selection through accessibility tree snapshots, making it ideal for agent-driven web automation workflows.

When to use

Use case 1: When the user asks to automate web interactions (fill forms, click buttons, navigate sites)
Use case 2: When you need to capture screenshots or generate PDFs of web pages
Use case 3: For web scraping tasks that require JavaScript rendering or complex interactions
Use case 4: When building automation workflows that need deterministic element references
Use case 5: For testing web applications with agent-driven scenarios

Required tools / APIs

No external API required (runs locally)
agent-browser: Headless browser CLI with Rust/Node.js implementation
Chromium: Downloaded automatically during installation

Install options:

# via npm (global)
npm install -g agent-browser
agent-browser install  # Downloads Chromium

# via Homebrew (macOS/Linux)
brew install agent-browser

# Verify installation
agent-browser --version

Skills

browser_open_and_snapshot

Open a URL and capture the accessibility tree to identify interactive elements.

# Open a webpage
agent-browser open https://example.com

# Get snapshot with element references
agent-browser snapshot

# The snapshot shows elements with @e1, @e2 references
# Example output:
# @e1 button "Sign In"
# @e2 input "Email" (email)
# @e3 input "Password" (password)

Node.js:

const { execSync } = require('child_process');

function browserCommand(cmd) {
  return execSync(`agent-browser ${cmd}`, { encoding: 'utf-8' });
}

async function openAndSnapshot(url) {
  browserCommand(`open ${url}`);
  await new Promise(r => setTimeout(r, 2000)); // Wait for page load
  const snapshot = browserCommand('snapshot');
  return snapshot; // Returns element tree with references
}

// Usage
// const elements = await openAndSnapshot('https://example.com');
// console.log(elements);

browser_interact

Interact with page elements using deterministic references from snapshots.

# Fill a form field
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"

# Click a button
agent-browser click @e1

# Type text into active element
agent-browser type "search query" --enter

# Navigate
agent-browser back
agent-browser forward
agent-browser reload

Node.js:

function fillForm(formData) {
  for (const [ref, value] of Object.entries(formData)) {
    execSync(`agent-browser fill ${ref} "${value}"`, { encoding: 'utf-8' });
  }
}

function clickElement(ref) {
  return execSync(`agent-browser click ${ref}`, { encoding: 'utf-8' });
}

// Usage
// fillForm({ '@e2': 'user@example.com', '@e3': 'password123' });
// clickElement('@e1');

browser_capture

Capture screenshots, PDFs, or extract page content.

# Take a screenshot
agent-browser screenshot output.png

# Generate PDF
agent-browser pdf document.pdf

# Get page text content
agent-browser text

# Get HTML source
agent-browser html

# Get specific element attribute
agent-browser attribute @e5 href

Node.js:

function captureScreenshot(filename) {
  return execSync(`agent-browser screenshot ${filename}`, { encoding: 'utf-8' });
}

function generatePDF(filename) {
  return execSync(`agent-browser pdf ${filename}`, { encoding: 'utf-8' });
}

function getPageText() {
  return execSync('agent-browser text', { encoding: 'utf-8' });
}

function getElementAttribute(ref, attr) {
  return execSync(`agent-browser attribute ${ref} ${attr}`, { encoding: 'utf-8' }).trim();
}

// Usage
// captureScreenshot('page.png');
// const text = getPageText();
// const link = getElementAttribute('@e10', 'href');

browser_session_management

Manage browser sessions, tabs, and persistent state.

# Session management
agent-browser open https://example.com --session myapp
agent-browser close --session myapp

# Tab management
agent-browser open https://example.com --new-tab
agent-browser tabs list
agent-browser tabs switch 0

# Cookie and storage
agent-browser cookies get example.com
agent-browser storage set mykey "myvalue"
agent-browser storage get mykey

# Close browser
agent-browser close

Node.js:

function openSession(url, sessionName) {
  return execSync(`agent-browser open ${url} --session ${sessionName}`, { encoding: 'utf-8' });
}

function closeSession(sessionName) {
  return execSync(`agent-browser close --session ${sessionName}`, { encoding: 'utf-8' });
}

function manageStorage(action, key, value = null) {
  const cmd = value
    ? `agent-browser storage ${action} ${key} "${value}"`
    : `agent-browser storage ${action} ${key}`;
  return execSync(cmd, { encoding: 'utf-8' }).trim();
}

// Usage
// openSession('https://app.example.com', 'shopping-session');
// manageStorage('set', 'cart-id', '12345');
// const cartId = manageStorage('get', 'cart-id');

Rate limits / Best practices

Add delays between interactions (1-2 seconds) to allow page rendering
Use --wait flag for actions that trigger navigation or async updates
Close browser sessions when done to free system resources
Use --session flags to isolate different automation workflows
Cache snapshots when repeatedly interacting with the same page structure
Prefer element references (@e1) over selectors for deterministic behavior

Agent prompt

You have browser automation capability through agent-browser. When a user asks to automate web interactions:

1. Open the URL with `agent-browser open <url>`
2. Get the accessibility snapshot with `agent-browser snapshot` to identify interactive elements
3. Parse the snapshot output to find element references (like @e1, @e2)
4. Use `fill`, `click`, or `type` commands with element references to interact
5. Use `screenshot` or `pdf` to capture results when requested
6. Always close the browser session with `agent-browser close` when done

For multi-step workflows:
- Wait 1-2 seconds between actions for page updates
- Take snapshots after navigation to get updated element references
- Use sessions (`--session name`) to maintain state across multiple operations
- Extract page text or HTML to verify successful interactions

Always prefer agent-browser over other scraping tools when:
- JavaScript rendering is required
- User interactions (clicks, form fills) are needed
- You need screenshots or visual verification

Troubleshooting

Error: Chromium not installed:

Symptom: "Browser binary not found" error
Solution: Run agent-browser install to download Chromium

Error: Element reference not found (@e5):

Symptom: "Element not found" when using a reference
Solution: Take a fresh snapshot after page navigation; element references change between pages

Error: Timeout waiting for element:

Symptom: Commands hang or timeout
Solution: Add explicit wait time with --wait 5000 flag or use delays between commands

Page not fully loaded:

Symptom: Snapshot shows incomplete page elements
Solution: Add sleep/delay after opening URL before taking snapshot

Session conflicts:

Symptom: "Session already exists" or unexpected state
Solution: Close existing sessions with agent-browser close --session <name> before starting new ones

Additional Notes

Advantages over traditional scraping

Handles JavaScript-rendered content automatically
Deterministic element selection through accessibility tree
Screenshot and PDF generation built-in
Persistent sessions and state management
Designed for agent workflows with clear CLI interface

Cloud integration (optional)

Agent-browser supports cloud browser providers:

Browserbase: agent-browser --provider browserbase
Browser Use: Enterprise browser automation
Kernel: Distributed browser sessions

For most use cases, local installation is sufficient and avoids external dependencies.

Source

git clone https://github.com/besoeasy/open-skills/blob/main/skills/browser-automation-agent/SKILL.mdView on GitHub

Overview

Browser Automation with Agent-Browser is a headless CLI that lets AI agents control a browser. It achieves deterministic element references by capturing accessibility-tree snapshots, enabling reliable form filling, clicking, navigation, and automated workflows.

How This Skill Works

You open a page with agent-browser, run a snapshot to generate stable element references (e.g., @e1, @e2). Then use commands like fill, click, or type with those refs. It runs locally (no external API) and supports capturing screenshots, PDFs, and extracting text, HTML, or attributes for verification.

When to Use It

Automate web interactions (fill forms, click buttons, navigate) for AI-driven tasks
Capture screenshots or generate PDFs of web pages
Web scraping tasks that require JavaScript rendering or complex interactions
Automation workflows that rely on deterministic element references
Testing web applications with agent-driven scenarios

Quick Start

Step 1: Install and verify installation: npm install -g agent-browser; agent-browser install; agent-browser --version
Step 2: Open a URL and snapshot: agent-browser open https://example.com; agent-browser snapshot
Step 3: Interact and capture outputs: agent-browser fill @e2 "user@example.com"; agent-browser click @e1; agent-browser screenshot page.png

Best Practices

Always run a fresh snapshot after page load to get up-to-date element references
Update snapshots when page structure changes to keep refs accurate
Prefer deterministic refs (@eN) over brittle text-based selectors
Implement timeouts and wait for page readiness before actions
Validate outputs (screenshots, text, HTML, or attributes) as part of the test plan

Example Use Cases

Open https://example.com and run snapshot to obtain @e references
Fill @e2 with user@example.com, fill @e3 with password, then click @e1 to sign in
Capture a screenshot: agent-browser screenshot login.png
Generate a PDF: agent-browser pdf report.pdf
Get page text and retrieve a link href: agent-browser text followed by agent-browser attribute @e5 href

Frequently Asked Questions

Add this skill to your agents

browser-automation-agent

Browser Automation with Agent-Browser

When to use

Required tools / APIs

Skills

browser_open_and_snapshot

browser_interact

browser_capture

browser_session_management

Rate limits / Best practices

Agent prompt

Troubleshooting

See also

Additional Notes

Advantages over traditional scraping

Cloud integration (optional)

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

How do I get started with agent-browser?

What does deterministic element selection mean here?

What should I do if elements change or refs become stale?