What is agent-browser?

A CLI tool for browser automation designed for AI agents, from Vercel Labs.

How should I select elements reliably?

Use refs-based selectors (@e1, @e2) and combine CSS, XPath, or semantic locators.

Can I run multiple sessions and connect to existing browsers?

Yes. The skill supports multiple sessions, agent mode, and CDP connections for integration.

agent-browser

Scanned

npx machina-cli add skill partme-ai/full-stack-skills/agent-browser --openclaw

Files (1)

SKILL.md

7.4 KB

When to use this skill

Use this skill whenever the user wants to:

Automate browser interactions via CLI commands
Use browser automation for AI agents
Navigate websites and interact with pages using command-line tools
Use refs-based element selection for deterministic automation
Integrate browser automation into AI agent workflows
Capture snapshots of web pages with accessibility trees
Fill forms, click elements, and extract content via CLI
Use semantic locators for more reliable element selection
Work with browser automation in agent mode with JSON output
Manage multiple browser sessions
Debug browser automation with headed mode
Use authenticated sessions with custom headers
Connect to existing browsers via CDP
Stream browser viewport for live preview

How to use this skill

This skill is organized to match the agent-browser official documentation structure (https://github.com/vercel-labs/agent-browser/blob/main/README.md). When working with agent-browser:

Install agent-browser:
- Load examples/getting-started/installation.md for installation instructions
Quick Start:
- Load examples/quick-start/quick-start.md for basic workflow examples
Learn core commands:
- Load examples/commands/basic-commands.md for basic commands (open, click, fill, etc.)
- Load examples/commands/advanced-commands.md for advanced commands (snapshot, eval, etc.)
- Load examples/commands/get-info/ for information retrieval commands
- Load examples/commands/check-state/ for state checking commands
- Load examples/commands/find-elements/ for semantic locator commands
- Load examples/commands/wait/ for wait commands
- Load examples/commands/mouse-control/ for mouse control commands
- Load examples/commands/browser-settings/ for browser configuration
- Load examples/commands/cookies-storage/ for cookies and storage management
- Load examples/commands/network/ for network interception
- Load examples/commands/tabs-windows/ for tab and window management
- Load examples/commands/frames/ for iframe handling
- Load examples/commands/dialogs/ for dialog handling
- Load examples/commands/debug/ for debugging commands
- Load examples/commands/navigation/ for navigation commands
- Load examples/commands/setup/ for setup commands
Understand selectors:
- Load examples/selectors/refs.md for refs-based selection (@e1, @e2, etc.)
- Load examples/selectors/traditional-selectors.md for CSS, XPath, and semantic locators
Use agent mode:
- Load examples/agent-mode/introduction.md for agent mode overview
- Load examples/agent-mode/optimal-workflow.md for optimal AI workflow
- Load examples/agent-mode/integration.md for integrating with AI agents
Advanced features:
- Load examples/advanced/sessions.md for session management
- Load examples/advanced/headed-mode.md for debugging with visible browser
- Load examples/advanced/authenticated-sessions.md for authentication via headers
- Load examples/advanced/custom-executable.md for custom browser executable
- Load examples/advanced/cdp-mode.md for Chrome DevTools Protocol integration
- Load examples/advanced/streaming.md for browser viewport streaming
- Load examples/advanced/architecture.md for architecture overview
- Load examples/advanced/platforms.md for platform support
- Load examples/advanced/usage-with-agents.md for AI agent integration patterns
Configure options:
- Load examples/options/global-options.md for global CLI options
- Load examples/options/snapshot-options.md for snapshot-specific options
- Load examples/options/session-options.md for session management options
Reference API documentation when needed:
- api/commands.md - Complete command reference
- api/selectors.md - Selector reference
- api/options.md - Options reference
Use templates for quick start:
- templates/basic-automation.md - Basic automation workflow
- templates/ai-agent-workflow.md - AI agent workflow template

Doc mapping (one-to-one with official documentation)

See examples and API files → https://github.com/vercel-labs/agent-browser

Examples and Templates

This skill includes detailed examples organized to match the official documentation structure. All examples are in the examples/ directory (see mapping above).

To use examples:

Identify the topic from the user's request
Load the appropriate example file from the mapping above
Follow the instructions, syntax, and best practices in that file
Adapt the code examples to your specific use case

To use templates:

Reference templates in templates/ directory for common scaffolding
Adapt templates to your specific needs and coding style

API Reference

Commands API: api/commands.md - Complete command reference with syntax and examples
Selectors API: api/selectors.md - Selector types and usage reference
Options API: api/options.md - All options reference

Best Practices

Use Refs: Prefer refs (@e1, @e2) over traditional selectors for deterministic automation
Snapshot First: Always snapshot before interacting with elements to get refs
Agent Mode: Use --json flag for machine-readable output in agent mode
Session Management: Use --session to maintain state across commands
Interactive Snapshot: Use -i flag for interactive snapshot selection
Semantic Locators: Use semantic locators (role/name) when refs are not available
Error Handling: Check command exit codes and error messages
Wait for Navigation: Commands automatically wait for navigation to complete
Headed Mode: Use --headed for debugging, headless for production
CDP Integration: Use --cdp for Chrome DevTools Protocol integration
Streaming: Use AGENT_BROWSER_STREAM_PORT for live browser preview
Authenticated Sessions: Use --headers for authentication without login flows
Custom Executable: Use --executable-path for serverless deployments or custom browsers
Snapshot Options: Combine -i, -c, -d, -s options to optimize snapshot output

Resources

GitHub Repository: https://github.com/vercel-labs/agent-browser
Official README: https://github.com/vercel-labs/agent-browser/blob/main/README.md
Agent Mode Documentation: https://agent-browser.dev/agent-mode
Issues: https://github.com/vercel-labs/agent-browser/issues

Keywords

agent-browser, CLI browser automation, AI agents, browser automation CLI, refs, snapshot, agent mode, semantic locators, browser automation tool, command-line browser, AI agent browser, deterministic selectors, accessibility tree, browser commands, web automation CLI, sessions, headed mode, authenticated sessions, CDP mode, streaming, Chrome DevTools Protocol, Playwright, browser automation for AI

Source

git clone https://github.com/partme-ai/full-stack-skills/blob/main/skills/agent-browser/SKILL.mdView on GitHub

Overview

This skill covers installing agent-browser, core commands, selectors (refs, CSS, XPath, semantic), agent mode, sessions, and options. It is designed for AI agents that must automate browser tasks, navigate pages, fill forms, and extract content. It also explains testing and debugging workflows to ensure reliable web automation.

How This Skill Works

Agent-browser runs from the CLI to control a browser instance, offering commands like open, click, fill, and snapshot. It supports refs-based selectors and traditional CSS/XPath/semantic locators, plus sessions, agent mode, and CDP connections for integrating with existing browsers. Outputs are designed for downstream AI processing in JSON.

When to Use It

Automate browser interactions directly from scripts or prompts via the CLI.
Enable AI agents to navigate pages, fill forms, and extract content autonomously.
Manage multiple browser sessions (headless or headed) for parallel tasks.
Use refs-based selection and semantic locators for reliable element targeting.
Capture page snapshots, debug with headed mode, and customize headers or CDP connections.

Quick Start

Step 1: Install agent-browser and start a session as instructed in the docs.
Step 2: Use open to navigate to a URL, then click and fill to interact with elements.
Step 3: Run snapshot or get-info commands to verify results and export JSON output.

Best Practices

Prefer refs-based selectors (@e1, @e2) for deterministic automation.
Combine semantic locators with CSS/XPath to improve stability across page changes.
Use agent-mode with JSON output to integrate results into downstream workflows.
Isolate sessions per task and reuse authentication headers when needed.
Test in headed mode during development and switch to headless for production.

Example Use Cases

Open a product page, locate the price with a semantic locator, and extract the price value.
Fill a contact form, submit, and confirm a success message via CLI feedback.
Navigate a site, take a snapshot of the accessibility tree for QA, and log results.
Intercept network requests while scrolling and log response statuses for monitoring.
Manage multiple sessions to compare page states across different user accounts.

Frequently Asked Questions

Add this skill to your agents