A collection of 25 Playwright-based browser automation tools for AI agents via agent-browser.

How do I install and start?

Install with npm install -g @anthropic-ai/agent-browser, then run agent-browser start; copy this skill folder to .claude/skills/browser-ops/ and append this SKILL.md to AGENTS.md; see references/installation-guide.md for full steps.

How do I stay updated?

Check UPDATES.md in the browser-ops skill and follow UPDATE-GUIDE.md to diff local changes before overwriting; apply latest changes as directed.

browser-ops

npx machina-cli add skill buildoak/fieldwork-skills/browser-ops --openclaw

Files (1)

SKILL.md

20.8 KB

Browser Ops

Browser automation via agent-browser. 25 tools wrapping Playwright for navigation, interaction, observation, and session management. Validated on two benchmark suites: 12/15 pass on a 15-task suite (100% excluding external blockers), 9/10 on a 10-task progressive suite. Standout: Notion end-to-end signup with AgentMail OTP verification.

Terminology used in this file:

Playwright: A browser automation framework that lets tools control Chromium/Chrome.
a11y tree: The accessibility tree (screen-reader-friendly page structure) used by browser_snapshot.
DOM: Document Object Model, the browser's structured representation of page elements.
CSS selector: A rule for targeting specific DOM elements (for example .price or #submit).
OAuth: A standard login/authorization flow that redirects through an identity provider (for example, "Sign in with GitHub").

Setup

npm install -g @anthropic-ai/agent-browser
agent-browser start

Claude Code: copy this skill folder into .claude/skills/browser-ops/
Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the browser-ops skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.

Quick Start

The simplest possible browser flow: navigate, inspect, capture.

browser_navigate(url="https://example.com")
browser_snapshot(mode="interactive")
browser_screenshot(path="/tmp/example.png")
browser_close()

Decision Tree: Browser vs Other Tools

Ask this FIRST. Getting it wrong wastes significant token budget.

Need data from the web?
  |
  +-- Is it static content? (prices, articles, search results, public data)
  |     YES --> Use WebSearch / WebFetch (built-in tools)
  |             ~100 tokens. No browser overhead.
  |
  +-- Does it require interaction? (login, form fill, click sequences, session state)
  |     YES --> Use browser tools
  |
  +-- Does it require email verification?
  |     YES --> Use browser + AgentMail (see Email Verification section)
  |
  +-- Is the target known to block bots? (Cloudflare-protected, etc.)
        YES --> Check references/failure-log.md before starting.
              May need stealth config or alternative approach.

Rule of thumb: If you can get the data with curl, you don't need a browser.

Core Workflow

Every browser task follows this loop:

1. browser_navigate(url)                       -- go to the page
2. browser_snapshot(mode='interactive')        -- get refs (@e1, @e2...)
3. Identify target ref from snapshot           -- find the button/input/link
4. browser_click(@ref) / browser_fill(@ref, text) -- act
5. browser_snapshot(mode='interactive')        -- verify result
6. Repeat 3-5 until done
7. browser_close()                             -- ALWAYS close when done

The ref system: Snapshot returns element references like @e1, @e2. Use these refs with click/fill/type. Refs are stable within a page state but reset after navigation.

Token Efficiency: Snapshot Modes

Mode	Tokens/page	Shows	Use when
`interactive`	~1,400	Buttons, links, inputs only	Default for everything
`compact`	~3,000-5,000	Condensed full tree	Need text content + interactive
`full`	~15,000	Complete a11y tree	Last resort, known need

Default to interactive. It is 10x cheaper than full and sufficient for 90% of tasks.

Tiered Access Model

Tier 1: A11y Tree Snapshot (~1,400 tokens/page)
  browser_snapshot(mode='interactive') --> get refs --> click/fill
  For: navigation, form filling, structured page interaction
  This is your DEFAULT.

Tier 2: Screenshot + VLM (0 API tokens) [EXPERIMENTAL]
  browser_screenshot() --> local VLM (Qwen3-VL-2B / UI-TARS-1.5-7B)
  For: visual-only content, CAPTCHAs, pages where a11y tree misses data

Tier 3: Targeted DOM Extraction (variable tokens)
  browser_evaluate('document.querySelector(sel).textContent')
  For: known pages with known CSS selectors, JSON-LD extraction
  Use when you know EXACTLY what element contains the data.

Escalation path: Start at Tier 1. If snapshot doesn't show the data you need, try Tier 3 with a targeted selector. Only use Tier 2 when visual understanding is required.

Token Optimization for Data-Heavy Pages

For content-rich pages (HN, Reddit, forums, dashboards), the interactive snapshot balloons from ~1,400 tokens (simple pages) to ~47K tokens (dense pages). This wrecks budgets.

Pattern: Snapshot first to understand page structure, then browser_evaluate with targeted JS for bulk extraction.

1. browser_navigate(url)
2. browser_snapshot(mode='interactive')   -- understand structure (pay cost once)
3. browser_evaluate('                     -- extract data surgically
     JSON.stringify(
       [...document.querySelectorAll(".titleline a")]
         .map(a => ({title: a.textContent, href: a.href}))
     )
   ')
4. Parse JSON result -- structured data at ~200 tokens vs 47K snapshot

When to use: Any page where you need to extract 10+ items of the same type. Snapshot gives you the selector knowledge; eval gives you the data cheaply.

Email Verification (AgentMail)

For tasks requiring email verification (account signup, OTP flows).

Setup

AgentMail Python wrapper: ./scripts/mailbox.py (self-contained)
CLI wrapper: ./scripts/agentmail.sh
Dependencies: ./scripts/requirements.txt
First-time setup: ./scripts/agentmail.sh setup
Create your own mailbox (see pattern below)

AgentMail provides disposable email inboxes for AI agents. You create a mailbox, use the address in signup forms, then poll for incoming verification emails and extract OTP codes or links.

The Pattern (Validated on Notion Signup)

1. Create mailbox:     ./scripts/agentmail.sh create <username>
2. Fill signup form:   browser_fill(ref, "username@agentmail.to")
3. Submit form:        browser_click(ref)
4. Poll for email:     ./scripts/agentmail.sh poll username@agentmail.to --timeout 120
5. Extract OTP/link:   ./scripts/agentmail.sh extract <inbox_id> <msg_id>
6. Enter OTP:          browser_fill(ref, "123456")
7. Submit:             browser_click(ref)

Gotchas

Emails take 5-30 seconds to arrive. Always poll with timeout.
Some services detect agentmail.to domain -- have backup strategy.
OTP codes expire. Extract and submit promptly after polling.

Validated Flows

Notion signup: Full end-to-end -- signup, OTP poll, extract, submit, onboarding, page creation.
PKP forum: Email verification worked. Blocked by moderator approval gate (external).

Session Rules

CRITICAL: No parallel browser sessions.

All tools share one browser daemon per session
Parallel usage causes state collisions (one action navigates, another loses its page)
Run browser tasks SEQUENTIALLY. Always.
AGENT_BROWSER_SESSION env var controls session name (default: "mcp")
Per-session isolation is NOT yet implemented

Always close the browser when done:

browser_close()  -- releases the session for the next task

Forgetting to close leaves an orphaned Chromium process.

Stealth Configuration

Layer 1 provides basic stealth via environment variables. All browser sessions can run with headed mode, custom UA, persistent profile, and automation flag disabled.

For stricter sites, escalate to Layer 2+. Full guide: ./references/stealth-config.md.

Quick Setup (5 min, $0)

export AGENT_BROWSER_HEADED=1
export AGENT_BROWSER_USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36"
export AGENT_BROWSER_PROFILE="$HOME/.agent-browser/profiles/stealth"
export AGENT_BROWSER_ARGS="--disable-blink-features=AutomationControlled"
mkdir -p ~/.agent-browser/profiles/stealth

Escalation Path

Layer 1 (env vars above) -- beats Cloudflare free tier
Layer 2 (rebrowser-patches: npx rebrowser-patches@latest patch) -- beats Cloudflare Pro
Layer 3 (Kernel cloud: AGENT_BROWSER_PROVIDER=kernel) -- beats most anti-bot
Layer 4 (residential proxy: AGENT_BROWSER_PROXY=...) -- beats IP-based blocking

Key Env Vars

Env Var	Purpose	Default
`AGENT_BROWSER_SESSION`	Session name for isolation	`mcp`
`AGENT_BROWSER_HEADED`	`"1"` = headed mode	off
`AGENT_BROWSER_USER_AGENT`	Custom UA string	Chromium default
`AGENT_BROWSER_ARGS`	Chromium launch args	none
`AGENT_BROWSER_PROFILE`	Persistent browser profile path	none
`AGENT_BROWSER_PROXY`	Proxy server URL	none
`AGENT_BROWSER_PROVIDER`	Cloud provider (kernel, browserbase)	none

Benchmark Results (Feb 2026)

15-task browser autonomy benchmark. 12/15 pass (100% excluding external blockers).

Capability	Tasks	Evidence
Login + session cookies	1, 6, 9	Sauce Demo, HN, quotes.toscrape
Multi-field registration	2, 7	11-step account lifecycle
Complex form widgets	3	Date pickers, React Select, file upload
Drag-drop, alerts, iframes	5, 14	Multiple interaction types
Paginated scraping with session	9	50 quotes across 5 pages
SaaS signup with email OTP	12	Notion end-to-end
OAuth redirect flow	13	GitHub OAuth chain
Google Flights SPA	11	Dynamic JS search + filter
Multi-site autonomous flow	15	Two sites, single session
Error recovery	14	Form validation, alerts, iframes

3 failures (all external): SSL outage, Cloudflare transparent challenge, moderator gate.

Test Suite v2 (10-Task Progressive)

#	Tier	Task	Result	Calls	Time
1	Medium	Reddit scraping (old.reddit.com)	PASS	14	25s
2	Medium	HN thread extraction	PASS	13	35s
3	Medium	SauceDemo e-commerce flow	PASS	38	61s
4	Hard	GitHub repo data extraction	PASS	37	83s
5	Hard	Google Flights search + filter	PASS	21	48s
6	Hard	HN account lifecycle	PASS	22	39s
7	Brutal	Stripe iframe checkout	PASS	59	168s
8	Brutal	Wikipedia multi-language	PASS	11	63s
9	Brutal	Cloudflare stealth gauntlet	PASS	12	36s
10	Final Boss	Linear E2E + AgentMail	PARTIAL	~40	~180s

Test 10 blocked by Cloudflare Turnstile CAPTCHA -- requires Layer 2+ stealth. Not an agent or skill gap.

Quick Tool Reference

25 tools in 5 categories. Full details in ./references/tool-inventory.md.

Category	Tools
Navigation	`navigate`, `back`, `forward`, `reload`
Observation	`snapshot`, `screenshot`, `get_url`, `get_title`, `get_text`, `get_html`
Interaction	`click`, `dblclick`, `fill`, `type`, `press`, `select`, `hover`, `focus`, `clear`, `check`, `uncheck`
Page	`scroll`, `wait`, `evaluate`
Session	`close`

All tool names are prefixed with browser_ (e.g., browser_click, browser_snapshot).

fill vs type

Method	Behavior	Use when
`browser_fill`	Clears field, sets value instantly	Standard form fields (95% of cases)
`browser_type`	Types character by character, triggers keystrokes	Autocomplete, search-as-you-type, custom widgets

Common Workflow Patterns

See ./references/battle-tested-patterns.md for 12 complete patterns with examples.

Pattern	Complexity	Key Technique
Standard login	Low	fill + click + wait + snapshot
Multi-field registration	Medium	fill + select + check + click
SaaS signup with OTP	High	AgentMail create + fill + poll + extract + fill
Paginated scraping	Medium	snapshot(compact) + click(Next) loop
OAuth redirect	Medium	click(OAuth button) + wait + follow redirects
Error recovery	Medium	submit + snapshot(check errors) + fix + resubmit
SPA navigation	Medium	type(not fill) + wait + snapshot for dynamic content
Targeted extraction	Low	browser_evaluate(JS selector)
Multi-site flow	High	Multiple navigates, single session, screenshot evidence
Targeted DOM extraction	Low	browser_evaluate(JS selector) for JSON-LD and specific elements
Post-search verification	Medium	snapshot results + verify params + recovery loop
Calendar widget protocol	Medium	click date field + navigate months + click date cells

Health Check

Before starting browser work, verify the stack:

./scripts/browser-check.sh           # full check (CLI + daemon + stealth + agentmail)
./scripts/browser-check.sh quick     # just CLI + daemon
./scripts/browser-check.sh stealth   # stealth config status

URL Pre-Population Pattern

For complex SPAs with autocomplete widgets, geo-defaults, or custom form components that resist browser_type:

Skip the form. Navigate directly to a URL with parameters pre-encoded.
Google Flights example: https://www.google.com/travel/flights?q=Flights+from+SFO+to+NRT+on+2026-04-17+return+2026-05-01
Why: Custom React/Material autocomplete widgets often ignore browser_type input or revert to geo-defaults. URL params bypass the widget layer entirely.
When to use: After 2-3 failed attempts to interact with a complex form widget. Don't fight the DOM -- go around it.

iframe Bypass Pattern

When cross-origin iframes block browser_fill/browser_type (e.g., Stripe payment forms):

Snapshot the page and identify the iframe element
Use browser_evaluate to extract the iframe's src URL: document.querySelector('iframe').src
Navigate directly to that URL -- this renders the iframe content as a regular page
Interact with all fields normally using browser_fill/browser_type

Evaluate-Only Mode for Heavy Pages

For content-heavy pages (Wikipedia, documentation sites, long articles):

Skip snapshots entirely. The a11y tree will be massive and blow your token budget.
Use browser_evaluate with targeted CSS selectors for all data extraction
Common selectors: document.querySelector('p').textContent, document.querySelectorAll('.reference').length, Array.from(document.querySelectorAll('h2')).map(e => e.textContent)

Playbooks

Per-site recipes with validated approaches. Load the relevant playbook before starting a task against a tested site.

Playbook	Site	Status	Key Pattern
`references/playbooks/booking-com.md`	Booking.com	PASS (workaround)	Landmark search + hotel calendar pricing
`references/playbooks/google-flights.md`	Google Flights	PASS	URL pre-population (`?q=`) bypasses autocomplete
`references/playbooks/linear-signup.md`	Linear	PARTIAL	Blocked by Cloudflare Turnstile; requires Layer 3
`references/playbooks/notion-signup.md`	Notion	PASS	Full E2E signup with AgentMail OTP verification
`references/playbooks/reddit-scraping.md`	Reddit	PASS	old.reddit.com + `?sort=hot` retry + evaluate extraction
`references/playbooks/stripe-iframe.md`	Stripe (iframe)	PASS	Extract iframe `src`, navigate directly, fill normally
`references/playbooks/cloudflare-sites.md`	Cloudflare (general)	Mixed	Decision tree: free tier (L1) vs Turnstile (L3)
`references/playbooks/wikipedia-extraction.md`	Wikipedia	PASS	Evaluate-only mode, zero snapshots, CSS selectors
`references/playbooks/headed-browser-setup.md`	(general)	Reference	Headed mode + persistent profile setup

Anti-Patterns

Do NOT	Do instead
Use browser for static content (prices, articles)	`WebSearch` or `WebFetch` (built-in tools)
Use `snapshot(mode='full')` by default	Use `interactive` mode (10x cheaper)
Run parallel browser sessions	Run sequentially, one at a time
Forget `browser_close()` at end	Always close when done
Retry failed anti-bot sites blindly	Check `references/failure-log.md` first
Load browser tools for non-browser tasks	Only use browser when interaction is needed
Use `browser_type` when `browser_fill` works	`fill` is faster; `type` is for keystroke-sensitive inputs
Skip screenshot evidence	Screenshot at key milestones for verification
Use `browser_fill` for autocomplete fields	`browser_type` triggers keystroke events for suggestions
Attempt Cloudflare Turnstile sites at Layer 1	Interactive CAPTCHA requires Layer 2+ stealth

Error Handling

Common browser automation errors and recovery strategies.

Error	Symptoms	Recovery
Playwright timeout	`TimeoutError: waiting for selector` or navigation timeout	Retry with longer `browser_wait` (double the timeout). Check if page is still loading. If persistent, the element may not exist -- re-snapshot to verify page state.
Stale element ref	Action fails on a previously valid `@eN` ref	Refs reset after any navigation or major DOM change. Re-run `browser_snapshot()` to get fresh refs, then retry the action with the new ref.
Element not found	`browser_click`/`browser_fill` fails -- ref not in snapshot	1) Verify the page fully loaded (`browser_wait` or check URL). 2) Try a CSS selector fallback. 3) The element may be below the fold -- `browser_scroll(direction="down")` then re-snapshot.
Network error	Navigation fails, page doesn't load	Retry `browser_navigate` to the same URL. If persistent, check if site is down or blocking (see `references/failure-log.md`).
Session collision	Random failures, wrong page content, unexpected state	Another task is using the browser. Browser tasks must run SEQUENTIALLY. Close any orphaned sessions with `browser_close()` and retry.
Anti-bot block	Blank page, CAPTCHA, access denied, redirect to challenge page	Check `references/stealth-config.md` for escalation layers. Do not retry blindly -- escalate stealth level first.
`browser_evaluate` syntax error	`SyntaxError: Unexpected token` in eval expression	Do NOT use `return` keyword in `browser_evaluate` expressions -- eval expects a JS expression, not a statement. Use `document.title` not `return document.title`.

General principle: When an action fails, always re-snapshot before retrying. The page state may have changed since your last observation.

Bundled Resources Index

Path	What	When to load
`./UPDATES.md`	Structured changelog for AI agents	When checking for new features or updates
`./UPDATE-GUIDE.md`	Instructions for AI agents performing updates	When updating this skill
`./references/installation-guide.md`	Detailed install walkthrough for Claude Code and Codex CLI	First-time setup or environment repair
`./references/tool-inventory.md`	Full 25-tool API reference with params and examples	When you need exact tool syntax
`./references/battle-tested-patterns.md`	12 validated workflow patterns from benchmark	When building a new browser workflow
`./references/failure-log.md`	Benchmark results, anti-bot findings, AgentMail details	Before targeting a new site
`./references/stealth-config.md`	Anti-detection layered configuration guide	When hitting bot detection
`./references/test-results.md`	Full benchmark test cases (v1 + v2) with detailed logs	When reviewing what has been tested and what works
`./references/anti-detection-guide.md`	4-tier stealth escalation with decision tree	When planning stealth strategy for a new target
`./references/playbooks/`	Per-site recipes with validated approaches	Before automating a tested site
`./references/playbooks/headed-browser-setup.md`	Profile setup, trust building, headed mode guide	When setting up headed browser for high-detection sites
`./scripts/agentmail.sh`	AgentMail CLI wrapper (setup/create/poll/extract)	For email verification flows
`./scripts/mailbox.py`	AgentMail Python SDK wrapper	Called by agentmail.sh (self-contained)
`./scripts/requirements.txt`	Python dependencies for AgentMail	Used by agentmail.sh setup
`./scripts/browser-check.sh`	Browser stack health check	Before first browser task in a session

Source

git clone https://github.com/buildoak/fieldwork-skills/blob/main/skills/browser-ops/SKILL.mdView on GitHub

Overview

Browser-ops provides 25 Playwright-based tools to drive navigation, interaction, observation, and session management for AI coding agents via agent-browser. It enables end-to-end browser flows and has demonstrated strong results, including Notion signup with AgentMail OTP verification.

How This Skill Works

The skill interfaces with agent-browser to control Chromium-based browsers, exposing functions like browser_navigate, browser_snapshot, browser_click, browser_fill, browser_screenshot, and browser_close. Operations rely on a stable ref system (e.g., @e1) and follow a core workflow: navigate, snapshot, identify, act, snapshot, repeat, then close.

When to Use It

Need data from the web that requires interaction or dynamic content
Requires login or session state (OAuth, multi-step forms)
Needs email verification via AgentMail as part of automation
Pages may block bots or require stealth considerations
You want to capture UI structure and element references for automation testing or data extraction

Quick Start

Step 1: browser_navigate(url='https://example.com')
Step 2: browser_snapshot(mode='interactive')
Step 3: browser_screenshot(path='/tmp/example.png')

Best Practices

Use the ref system from browser_snapshot to identify and reuse elements (@e1, @e2, ...)
Always close the browser when the task is complete to free resources
Start with the core workflow: navigate, snapshot, identify, act, snapshot, repeat
Lean on interactive snapshots to discover elements; switch to compact when token efficiency is needed
If data is static, prefer WebSearch/WebFetch to minimize browser overhead

Example Use Cases

Notion signup flow automated with AgentMail OTP verification
OAuth login flow to test redirects and session establishment
Automated multi-step form submission with field fills and button clicks
Accessibility snapshot to capture a11y tree for QA purposes
Dynamic product page pricing retrieval after option selections

Frequently Asked Questions

Add this skill to your agents

browser-ops

Browser Ops

Setup

Staying Updated

Quick Start

Decision Tree: Browser vs Other Tools

Core Workflow

Token Efficiency: Snapshot Modes

Tiered Access Model

Token Optimization for Data-Heavy Pages

Email Verification (AgentMail)

Setup

The Pattern (Validated on Notion Signup)

Gotchas

Validated Flows

Session Rules

Stealth Configuration

Quick Setup (5 min, $0)

Escalation Path

Key Env Vars

Benchmark Results (Feb 2026)

Test Suite v2 (10-Task Progressive)

Quick Tool Reference

fill vs type

Common Workflow Patterns

Health Check

URL Pre-Population Pattern

iframe Bypass Pattern

Evaluate-Only Mode for Heavy Pages

Playbooks

Anti-Patterns

Error Handling

Bundled Resources Index

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What is browser-ops?

How do I install and start?

How do I stay updated?