Get the FREE Ultimate OpenClaw Setup Guide →

browser-ops

npx machina-cli add skill buildoak/fieldwork-skills/browser-ops --openclaw
Files (1)
SKILL.md
20.8 KB

Browser Ops

Browser automation via agent-browser. 25 tools wrapping Playwright for navigation, interaction, observation, and session management. Validated on two benchmark suites: 12/15 pass on a 15-task suite (100% excluding external blockers), 9/10 on a 10-task progressive suite. Standout: Notion end-to-end signup with AgentMail OTP verification.

Terminology used in this file:

  • Playwright: A browser automation framework that lets tools control Chromium/Chrome.
  • a11y tree: The accessibility tree (screen-reader-friendly page structure) used by browser_snapshot.
  • DOM: Document Object Model, the browser's structured representation of page elements.
  • CSS selector: A rule for targeting specific DOM elements (for example .price or #submit).
  • OAuth: A standard login/authorization flow that redirects through an identity provider (for example, "Sign in with GitHub").

Setup

npm install -g @anthropic-ai/agent-browser
agent-browser start
  • Claude Code: copy this skill folder into .claude/skills/browser-ops/
  • Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the browser-ops skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.


Quick Start

The simplest possible browser flow: navigate, inspect, capture.

browser_navigate(url="https://example.com")
browser_snapshot(mode="interactive")
browser_screenshot(path="/tmp/example.png")
browser_close()

Decision Tree: Browser vs Other Tools

Ask this FIRST. Getting it wrong wastes significant token budget.

Need data from the web?
  |
  +-- Is it static content? (prices, articles, search results, public data)
  |     YES --> Use WebSearch / WebFetch (built-in tools)
  |             ~100 tokens. No browser overhead.
  |
  +-- Does it require interaction? (login, form fill, click sequences, session state)
  |     YES --> Use browser tools
  |
  +-- Does it require email verification?
  |     YES --> Use browser + AgentMail (see Email Verification section)
  |
  +-- Is the target known to block bots? (Cloudflare-protected, etc.)
        YES --> Check references/failure-log.md before starting.
              May need stealth config or alternative approach.

Rule of thumb: If you can get the data with curl, you don't need a browser.


Core Workflow

Every browser task follows this loop:

1. browser_navigate(url)                       -- go to the page
2. browser_snapshot(mode='interactive')        -- get refs (@e1, @e2...)
3. Identify target ref from snapshot           -- find the button/input/link
4. browser_click(@ref) / browser_fill(@ref, text) -- act
5. browser_snapshot(mode='interactive')        -- verify result
6. Repeat 3-5 until done
7. browser_close()                             -- ALWAYS close when done

The ref system: Snapshot returns element references like @e1, @e2. Use these refs with click/fill/type. Refs are stable within a page state but reset after navigation.


Token Efficiency: Snapshot Modes

ModeTokens/pageShowsUse when
interactive~1,400Buttons, links, inputs onlyDefault for everything
compact~3,000-5,000Condensed full treeNeed text content + interactive
full~15,000Complete a11y treeLast resort, known need

Default to interactive. It is 10x cheaper than full and sufficient for 90% of tasks.


Tiered Access Model

Tier 1: A11y Tree Snapshot (~1,400 tokens/page)
  browser_snapshot(mode='interactive') --> get refs --> click/fill
  For: navigation, form filling, structured page interaction
  This is your DEFAULT.

Tier 2: Screenshot + VLM (0 API tokens) [EXPERIMENTAL]
  browser_screenshot() --> local VLM (Qwen3-VL-2B / UI-TARS-1.5-7B)
  For: visual-only content, CAPTCHAs, pages where a11y tree misses data

Tier 3: Targeted DOM Extraction (variable tokens)
  browser_evaluate('document.querySelector(sel).textContent')
  For: known pages with known CSS selectors, JSON-LD extraction
  Use when you know EXACTLY what element contains the data.

Escalation path: Start at Tier 1. If snapshot doesn't show the data you need, try Tier 3 with a targeted selector. Only use Tier 2 when visual understanding is required.

Token Optimization for Data-Heavy Pages

For content-rich pages (HN, Reddit, forums, dashboards), the interactive snapshot balloons from ~1,400 tokens (simple pages) to ~47K tokens (dense pages). This wrecks budgets.

Pattern: Snapshot first to understand page structure, then browser_evaluate with targeted JS for bulk extraction.

1. browser_navigate(url)
2. browser_snapshot(mode='interactive')   -- understand structure (pay cost once)
3. browser_evaluate('                     -- extract data surgically
     JSON.stringify(
       [...document.querySelectorAll(".titleline a")]
         .map(a => ({title: a.textContent, href: a.href}))
     )
   ')
4. Parse JSON result -- structured data at ~200 tokens vs 47K snapshot

When to use: Any page where you need to extract 10+ items of the same type. Snapshot gives you the selector knowledge; eval gives you the data cheaply.


Email Verification (AgentMail)

For tasks requiring email verification (account signup, OTP flows).

Setup

  • AgentMail Python wrapper: ./scripts/mailbox.py (self-contained)
  • CLI wrapper: ./scripts/agentmail.sh
  • Dependencies: ./scripts/requirements.txt
  • First-time setup: ./scripts/agentmail.sh setup
  • Create your own mailbox (see pattern below)

AgentMail provides disposable email inboxes for AI agents. You create a mailbox, use the address in signup forms, then poll for incoming verification emails and extract OTP codes or links.

The Pattern (Validated on Notion Signup)

1. Create mailbox:     ./scripts/agentmail.sh create <username>
2. Fill signup form:   browser_fill(ref, "username@agentmail.to")
3. Submit form:        browser_click(ref)
4. Poll for email:     ./scripts/agentmail.sh poll username@agentmail.to --timeout 120
5. Extract OTP/link:   ./scripts/agentmail.sh extract <inbox_id> <msg_id>
6. Enter OTP:          browser_fill(ref, "123456")
7. Submit:             browser_click(ref)

Gotchas

  • Emails take 5-30 seconds to arrive. Always poll with timeout.
  • Some services detect agentmail.to domain -- have backup strategy.
  • OTP codes expire. Extract and submit promptly after polling.

Validated Flows

  • Notion signup: Full end-to-end -- signup, OTP poll, extract, submit, onboarding, page creation.
  • PKP forum: Email verification worked. Blocked by moderator approval gate (external).

Session Rules

CRITICAL: No parallel browser sessions.

  • All tools share one browser daemon per session
  • Parallel usage causes state collisions (one action navigates, another loses its page)
  • Run browser tasks SEQUENTIALLY. Always.
  • AGENT_BROWSER_SESSION env var controls session name (default: "mcp")
  • Per-session isolation is NOT yet implemented

Always close the browser when done:

browser_close()  -- releases the session for the next task

Forgetting to close leaves an orphaned Chromium process.


Stealth Configuration

Layer 1 provides basic stealth via environment variables. All browser sessions can run with headed mode, custom UA, persistent profile, and automation flag disabled.

For stricter sites, escalate to Layer 2+. Full guide: ./references/stealth-config.md.

Quick Setup (5 min, $0)

export AGENT_BROWSER_HEADED=1
export AGENT_BROWSER_USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36"
export AGENT_BROWSER_PROFILE="$HOME/.agent-browser/profiles/stealth"
export AGENT_BROWSER_ARGS="--disable-blink-features=AutomationControlled"
mkdir -p ~/.agent-browser/profiles/stealth

Escalation Path

  1. Layer 1 (env vars above) -- beats Cloudflare free tier
  2. Layer 2 (rebrowser-patches: npx rebrowser-patches@latest patch) -- beats Cloudflare Pro
  3. Layer 3 (Kernel cloud: AGENT_BROWSER_PROVIDER=kernel) -- beats most anti-bot
  4. Layer 4 (residential proxy: AGENT_BROWSER_PROXY=...) -- beats IP-based blocking

Key Env Vars

Env VarPurposeDefault
AGENT_BROWSER_SESSIONSession name for isolationmcp
AGENT_BROWSER_HEADED"1" = headed modeoff
AGENT_BROWSER_USER_AGENTCustom UA stringChromium default
AGENT_BROWSER_ARGSChromium launch argsnone
AGENT_BROWSER_PROFILEPersistent browser profile pathnone
AGENT_BROWSER_PROXYProxy server URLnone
AGENT_BROWSER_PROVIDERCloud provider (kernel, browserbase)none

Benchmark Results (Feb 2026)

15-task browser autonomy benchmark. 12/15 pass (100% excluding external blockers).

CapabilityTasksEvidence
Login + session cookies1, 6, 9Sauce Demo, HN, quotes.toscrape
Multi-field registration2, 711-step account lifecycle
Complex form widgets3Date pickers, React Select, file upload
Drag-drop, alerts, iframes5, 14Multiple interaction types
Paginated scraping with session950 quotes across 5 pages
SaaS signup with email OTP12Notion end-to-end
OAuth redirect flow13GitHub OAuth chain
Google Flights SPA11Dynamic JS search + filter
Multi-site autonomous flow15Two sites, single session
Error recovery14Form validation, alerts, iframes

3 failures (all external): SSL outage, Cloudflare transparent challenge, moderator gate.

Test Suite v2 (10-Task Progressive)

#TierTaskResultCallsTime
1MediumReddit scraping (old.reddit.com)PASS1425s
2MediumHN thread extractionPASS1335s
3MediumSauceDemo e-commerce flowPASS3861s
4HardGitHub repo data extractionPASS3783s
5HardGoogle Flights search + filterPASS2148s
6HardHN account lifecyclePASS2239s
7BrutalStripe iframe checkoutPASS59168s
8BrutalWikipedia multi-languagePASS1163s
9BrutalCloudflare stealth gauntletPASS1236s
10Final BossLinear E2E + AgentMailPARTIAL~40~180s

Test 10 blocked by Cloudflare Turnstile CAPTCHA -- requires Layer 2+ stealth. Not an agent or skill gap.


Quick Tool Reference

25 tools in 5 categories. Full details in ./references/tool-inventory.md.

CategoryTools
Navigationnavigate, back, forward, reload
Observationsnapshot, screenshot, get_url, get_title, get_text, get_html
Interactionclick, dblclick, fill, type, press, select, hover, focus, clear, check, uncheck
Pagescroll, wait, evaluate
Sessionclose

All tool names are prefixed with browser_ (e.g., browser_click, browser_snapshot).

fill vs type

MethodBehaviorUse when
browser_fillClears field, sets value instantlyStandard form fields (95% of cases)
browser_typeTypes character by character, triggers keystrokesAutocomplete, search-as-you-type, custom widgets

Common Workflow Patterns

See ./references/battle-tested-patterns.md for 12 complete patterns with examples.

PatternComplexityKey Technique
Standard loginLowfill + click + wait + snapshot
Multi-field registrationMediumfill + select + check + click
SaaS signup with OTPHighAgentMail create + fill + poll + extract + fill
Paginated scrapingMediumsnapshot(compact) + click(Next) loop
OAuth redirectMediumclick(OAuth button) + wait + follow redirects
Error recoveryMediumsubmit + snapshot(check errors) + fix + resubmit
SPA navigationMediumtype(not fill) + wait + snapshot for dynamic content
Targeted extractionLowbrowser_evaluate(JS selector)
Multi-site flowHighMultiple navigates, single session, screenshot evidence
Targeted DOM extractionLowbrowser_evaluate(JS selector) for JSON-LD and specific elements
Post-search verificationMediumsnapshot results + verify params + recovery loop
Calendar widget protocolMediumclick date field + navigate months + click date cells

Health Check

Before starting browser work, verify the stack:

./scripts/browser-check.sh           # full check (CLI + daemon + stealth + agentmail)
./scripts/browser-check.sh quick     # just CLI + daemon
./scripts/browser-check.sh stealth   # stealth config status

URL Pre-Population Pattern

For complex SPAs with autocomplete widgets, geo-defaults, or custom form components that resist browser_type:

  • Skip the form. Navigate directly to a URL with parameters pre-encoded.
  • Google Flights example: https://www.google.com/travel/flights?q=Flights+from+SFO+to+NRT+on+2026-04-17+return+2026-05-01
  • Why: Custom React/Material autocomplete widgets often ignore browser_type input or revert to geo-defaults. URL params bypass the widget layer entirely.
  • When to use: After 2-3 failed attempts to interact with a complex form widget. Don't fight the DOM -- go around it.

iframe Bypass Pattern

When cross-origin iframes block browser_fill/browser_type (e.g., Stripe payment forms):

  1. Snapshot the page and identify the iframe element
  2. Use browser_evaluate to extract the iframe's src URL: document.querySelector('iframe').src
  3. Navigate directly to that URL -- this renders the iframe content as a regular page
  4. Interact with all fields normally using browser_fill/browser_type

Evaluate-Only Mode for Heavy Pages

For content-heavy pages (Wikipedia, documentation sites, long articles):

  • Skip snapshots entirely. The a11y tree will be massive and blow your token budget.
  • Use browser_evaluate with targeted CSS selectors for all data extraction
  • Common selectors: document.querySelector('p').textContent, document.querySelectorAll('.reference').length, Array.from(document.querySelectorAll('h2')).map(e => e.textContent)

Playbooks

Per-site recipes with validated approaches. Load the relevant playbook before starting a task against a tested site.

PlaybookSiteStatusKey Pattern
references/playbooks/booking-com.mdBooking.comPASS (workaround)Landmark search + hotel calendar pricing
references/playbooks/google-flights.mdGoogle FlightsPASSURL pre-population (?q=) bypasses autocomplete
references/playbooks/linear-signup.mdLinearPARTIALBlocked by Cloudflare Turnstile; requires Layer 3
references/playbooks/notion-signup.mdNotionPASSFull E2E signup with AgentMail OTP verification
references/playbooks/reddit-scraping.mdRedditPASSold.reddit.com + ?sort=hot retry + evaluate extraction
references/playbooks/stripe-iframe.mdStripe (iframe)PASSExtract iframe src, navigate directly, fill normally
references/playbooks/cloudflare-sites.mdCloudflare (general)MixedDecision tree: free tier (L1) vs Turnstile (L3)
references/playbooks/wikipedia-extraction.mdWikipediaPASSEvaluate-only mode, zero snapshots, CSS selectors
references/playbooks/headed-browser-setup.md(general)ReferenceHeaded mode + persistent profile setup

Anti-Patterns

Do NOTDo instead
Use browser for static content (prices, articles)WebSearch or WebFetch (built-in tools)
Use snapshot(mode='full') by defaultUse interactive mode (10x cheaper)
Run parallel browser sessionsRun sequentially, one at a time
Forget browser_close() at endAlways close when done
Retry failed anti-bot sites blindlyCheck references/failure-log.md first
Load browser tools for non-browser tasksOnly use browser when interaction is needed
Use browser_type when browser_fill worksfill is faster; type is for keystroke-sensitive inputs
Skip screenshot evidenceScreenshot at key milestones for verification
Use browser_fill for autocomplete fieldsbrowser_type triggers keystroke events for suggestions
Attempt Cloudflare Turnstile sites at Layer 1Interactive CAPTCHA requires Layer 2+ stealth

Error Handling

Common browser automation errors and recovery strategies.

ErrorSymptomsRecovery
Playwright timeoutTimeoutError: waiting for selector or navigation timeoutRetry with longer browser_wait (double the timeout). Check if page is still loading. If persistent, the element may not exist -- re-snapshot to verify page state.
Stale element refAction fails on a previously valid @eN refRefs reset after any navigation or major DOM change. Re-run browser_snapshot() to get fresh refs, then retry the action with the new ref.
Element not foundbrowser_click/browser_fill fails -- ref not in snapshot1) Verify the page fully loaded (browser_wait or check URL). 2) Try a CSS selector fallback. 3) The element may be below the fold -- browser_scroll(direction="down") then re-snapshot.
Network errorNavigation fails, page doesn't loadRetry browser_navigate to the same URL. If persistent, check if site is down or blocking (see references/failure-log.md).
Session collisionRandom failures, wrong page content, unexpected stateAnother task is using the browser. Browser tasks must run SEQUENTIALLY. Close any orphaned sessions with browser_close() and retry.
Anti-bot blockBlank page, CAPTCHA, access denied, redirect to challenge pageCheck references/stealth-config.md for escalation layers. Do not retry blindly -- escalate stealth level first.
browser_evaluate syntax errorSyntaxError: Unexpected token in eval expressionDo NOT use return keyword in browser_evaluate expressions -- eval expects a JS expression, not a statement. Use document.title not return document.title.

General principle: When an action fails, always re-snapshot before retrying. The page state may have changed since your last observation.


Bundled Resources Index

PathWhatWhen to load
./UPDATES.mdStructured changelog for AI agentsWhen checking for new features or updates
./UPDATE-GUIDE.mdInstructions for AI agents performing updatesWhen updating this skill
./references/installation-guide.mdDetailed install walkthrough for Claude Code and Codex CLIFirst-time setup or environment repair
./references/tool-inventory.mdFull 25-tool API reference with params and examplesWhen you need exact tool syntax
./references/battle-tested-patterns.md12 validated workflow patterns from benchmarkWhen building a new browser workflow
./references/failure-log.mdBenchmark results, anti-bot findings, AgentMail detailsBefore targeting a new site
./references/stealth-config.mdAnti-detection layered configuration guideWhen hitting bot detection
./references/test-results.mdFull benchmark test cases (v1 + v2) with detailed logsWhen reviewing what has been tested and what works
./references/anti-detection-guide.md4-tier stealth escalation with decision treeWhen planning stealth strategy for a new target
./references/playbooks/Per-site recipes with validated approachesBefore automating a tested site
./references/playbooks/headed-browser-setup.mdProfile setup, trust building, headed mode guideWhen setting up headed browser for high-detection sites
./scripts/agentmail.shAgentMail CLI wrapper (setup/create/poll/extract)For email verification flows
./scripts/mailbox.pyAgentMail Python SDK wrapperCalled by agentmail.sh (self-contained)
./scripts/requirements.txtPython dependencies for AgentMailUsed by agentmail.sh setup
./scripts/browser-check.shBrowser stack health checkBefore first browser task in a session

Source

git clone https://github.com/buildoak/fieldwork-skills/blob/main/skills/browser-ops/SKILL.mdView on GitHub

Overview

Browser-ops provides 25 Playwright-based tools to drive navigation, interaction, observation, and session management for AI coding agents via agent-browser. It enables end-to-end browser flows and has demonstrated strong results, including Notion signup with AgentMail OTP verification.

How This Skill Works

The skill interfaces with agent-browser to control Chromium-based browsers, exposing functions like browser_navigate, browser_snapshot, browser_click, browser_fill, browser_screenshot, and browser_close. Operations rely on a stable ref system (e.g., @e1) and follow a core workflow: navigate, snapshot, identify, act, snapshot, repeat, then close.

When to Use It

  • Need data from the web that requires interaction or dynamic content
  • Requires login or session state (OAuth, multi-step forms)
  • Needs email verification via AgentMail as part of automation
  • Pages may block bots or require stealth considerations
  • You want to capture UI structure and element references for automation testing or data extraction

Quick Start

  1. Step 1: browser_navigate(url='https://example.com')
  2. Step 2: browser_snapshot(mode='interactive')
  3. Step 3: browser_screenshot(path='/tmp/example.png')

Best Practices

  • Use the ref system from browser_snapshot to identify and reuse elements (@e1, @e2, ...)
  • Always close the browser when the task is complete to free resources
  • Start with the core workflow: navigate, snapshot, identify, act, snapshot, repeat
  • Lean on interactive snapshots to discover elements; switch to compact when token efficiency is needed
  • If data is static, prefer WebSearch/WebFetch to minimize browser overhead

Example Use Cases

  • Notion signup flow automated with AgentMail OTP verification
  • OAuth login flow to test redirects and session establishment
  • Automated multi-step form submission with field fills and button clicks
  • Accessibility snapshot to capture a11y tree for QA purposes
  • Dynamic product page pricing retrieval after option selections

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers