Get the FREE Ultimate OpenClaw Setup Guide →
Ö

Agent Browser

Use Caution

@okaris

npx machina-cli add skill @okaris/agentic-browser --openclaw
Files (1)
SKILL.md
9.7 KB

Agentic Browser

Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.

Agentic Browser

Quick Start

# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Core Workflow

Every browser automation follows this pattern:

  1. Open - Navigate to URL, get @e refs for elements
  2. Interact - Use refs to click, fill, drag, etc.
  3. Re-snapshot - After navigation/changes, get fresh refs
  4. Close - End session (returns video if recording)
# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'

Functions

FunctionDescription
openNavigate to URL, configure browser (viewport, proxy, video recording)
snapshotRe-fetch page state with @e refs after DOM changes
interactPerform actions using @e refs (click, fill, drag, upload, etc.)
screenshotTake page screenshot (viewport or full page)
executeRun JavaScript code on the page
closeClose session, returns video if recording was enabled

Interact Actions

ActionDescriptionRequired Fields
clickClick elementref
dblclickDouble-click elementref
fillClear and type textref, text
typeType text (no clear)text
pressPress key (Enter, Tab, etc.)text
selectSelect dropdown optionref, text
hoverHover over elementref
checkCheck checkboxref
uncheckUncheck checkboxref
dragDrag and dropref, target_ref
uploadUpload file(s)ref, file_paths
scrollScroll pagedirection (up/down/left/right), scroll_amount
backGo back in history-
waitWait millisecondswait_ms
gotoNavigate to URLurl

Element Refs

Elements are returned with @e refs:

@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"

Important: Refs are invalidated after navigation. Always re-snapshot after:

  • Clicking links/buttons that navigate
  • Form submissions
  • Dynamic content loading

Features

Video Recording

Record browser sessions for debugging or documentation:

# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true,
  "show_cursor": true
}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": <File>}

Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'

The cursor appears as a red dot that follows mouse movements and shows click feedback.

Proxy Support

Route traffic through a proxy server:

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'

File Upload

Upload files to file inputs:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'

Drag and Drop

Drag elements to targets:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'

JavaScript Execution

Run custom JavaScript:

infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": <File>}

Deep-Dive Documentation

ReferenceDescription
references/commands.mdFull function reference with all options
references/snapshot-refs.mdRef lifecycle, invalidation rules, troubleshooting
references/session-management.mdSession persistence, parallel sessions
references/authentication.mdLogin flows, OAuth, 2FA handling
references/video-recording.mdRecording workflows for debugging
references/proxy-support.mdProxy configuration, geo-testing

Ready-to-Use Templates

TemplateDescription
templates/form-automation.shForm filling with validation
templates/authenticated-session.shLogin once, reuse session
templates/capture-workflow.shContent extraction with screenshots

Examples

Form Submission

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Search and Extract

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
  "full_page": true
}'

# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'

Sessions

Browser state persists within a session. Always:

  1. Start with --session new on first call
  2. Use returned session_id for subsequent calls
  3. Close session when done

Related Skills

# Web search (for research + browse)
npx skills add inference-sh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models

Documentation

Source

git clone https://clawhub.ai/okaris/agentic-browserView on GitHub

Overview

Agentic Browser enables AI agents to automate web tasks via inference.sh, powered by Playwright. It lets you navigate pages, interact with elements using @e refs, and capture screenshots or video for auditing. Use it for web automation, data extraction, testing, agent browsing, and research.

How This Skill Works

It runs a session via the inference.sh CLI, opens a URL, and exposes page elements as @e refs. You perform actions like click, fill, type, drag, or upload using those refs, then re-snapshot to refresh refs after navigation or DOM changes. When finished, the session is closed and a recording is returned if video recording was enabled.

When to Use It

  • Automated login and form submission on a website
  • Scraping product data or headlines across pages
  • End-to-end UI testing with element refs
  • Agent-driven research: open multiple sources and collect snippets
  • Regression testing with evidence capture (video) of UI changes

Quick Start

  1. Step 1: Install CLI and login: curl -fsSL https://cli.inference.sh | sh && infsh login
  2. Step 2: Open a page and get interactive elements: infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
  3. Step 3: Interact and capture: run interact commands with @e refs, snapshot to refresh refs, then close the session

Best Practices

  • Open pages to establish stable @e refs, then re-snapshot after navigation or DOM changes
  • Use explicit wait and timeout controls to handle dynamic content
  • Leverage @e refs for precise interactions (click, fill, drag, upload)
  • Capture screenshots strategically (viewport vs. full page) to verify layouts
  • Close sessions when done to save resources and associated video

Example Use Cases

  • Login to a dashboard, navigate to metrics, and extract data points
  • Automatic form filling and submission for onboarding flows
  • Navigate product pages, extract prices and availability, log results
  • Automated UI test that clicks through a multi-step checkout
  • Record a browsing session for QA or stakeholder review

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers