agent-browser
Scannednpx machina-cli add skill fmschulz/omics-skills/agent-browser --openclawAgent Browser
Automate browser interactions through the agent-browser CLI for repeatable, scriptable web tasks.
Instructions
- Install and initialize the CLI.
- Open the target URL and capture a snapshot.
- Interact with elements using snapshot references.
- Re-snapshot after navigation or state changes.
- Export results (screenshots or JSON) for downstream use.
Quick Reference
| Task | Action |
|---|---|
| Install | npm install -g agent-browser then agent-browser install |
| Open page | agent-browser open <url> |
| Snapshot | agent-browser snapshot -i --json |
| Interact | click @eN, fill @eN "text" |
| Screenshot | agent-browser screenshot output.png |
| Docs | See references/quick-start.md |
Input Requirements
- Target URL(s)
- CLI installed and Chromium downloaded
- Credentials if login is required
Output
- Screenshots (PNG)
- JSON snapshots of page structure
- Extracted text/attributes
Quality Gates
- Snapshot captured after each major navigation step
- Interactions verified in a follow-up snapshot
- Outputs saved to disk with clear filenames
Examples
Example 1: Capture a page snapshot
agent-browser open https://example.org
agent-browser snapshot -i --json > page.json
Troubleshooting
Issue: Chromium not installed
Solution: Run agent-browser install (add --with-deps on Linux).
Issue: Element not found Solution: Re-snapshot and confirm the correct element reference.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/agent-browser/SKILL.mdView on GitHub Overview
agent-browser enables repeatable, scriptable browser tasks via a CLI. It supports opening target URLs, interacting with elements, taking snapshots, and exporting results as screenshots or JSON for downstream use, making UI testing and scraping more reliable.
How This Skill Works
Install the CLI and ensure Chromium is downloaded. Use commands like open, snapshot, click, and fill to automate web actions, then re-snapshot after navigation or state changes. Export outputs (screenshots or JSON) for downstream processing and verification.
When to Use It
- Automating login flows and verifying post-login state
- Running repeatable UI tests across pages
- Scraping data by capturing page structure and text
- Documenting UI changes by re-snapshotting after navigation
- Generating assets for dashboards by exporting screenshots and JSON
Quick Start
- Step 1: Install and initialize the CLI (npm install -g agent-browser; agent-browser install)
- Step 2: Open a URL and capture a JSON snapshot (agent-browser open <url>; agent-browser snapshot -i --json > page.json)
- Step 3: Interact with elements (click @eN, fill @eN "text"), then re-snapshot or take a screenshot
Best Practices
- Ensure a snapshot is captured after each major navigation step
- Re-snapshot after interactions to verify resulting state
- Use clear, descriptive filenames for outputs (screenshots.json, page.json, etc.)
- Keep element references stable (e.g., using @eN) to reduce flakiness
- Install Chromium and handle credentials securely (follow Linux with --with-deps if needed)
Example Use Cases
- Example 1: Open https://example.org and export a JSON snapshot to page.json (agent-browser open https://example.org; agent-browser snapshot -i --json > page.json)
- Example 2: Open a login page, fill credentials, submit, and capture a post-login snapshot
- Example 3: Navigate through multiple pages and take a screenshot after each step
- Example 4: Export a JSON snapshot of the page structure to feed into a data pipeline
- Example 5: Re-snapshot after a UI state change to verify updates and differences