Get the FREE Ultimate OpenClaw Setup Guide →

browser-automation

npx machina-cli add skill archubbuck/workspace-architect/browser-automation --openclaw
Files (1)
SKILL.md
8.7 KB

Browser Automation Skill

This skill provides local browser automation capabilities using Python and Playwright. All browser automation is performed locally via CLI commands.

When to Use This Skill

Use this skill when you need to:

  • Automate interactions with web pages (clicking, typing, navigating)
  • Test web application functionality
  • Extract content or data from web pages
  • Take screenshots of web pages
  • Execute custom JavaScript in browser context
  • Hover over elements to trigger UI states

Prerequisites

Before using this skill, ensure Playwright is installed:

pip install playwright
playwright install chromium

Available Tools

All tools are implemented as subcommands in assets/skills/browser-automation/scripts/browser_tools.py. Each command is stateless - it launches a new browser instance, performs the action, and closes the browser.

browser_navigate

Navigate to a URL and wait for the page to load.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_navigate <url>

Example:

python assets/skills/browser-automation/scripts/browser_tools.py browser_navigate https://example.com

browser_click

Click an element on a page using a CSS selector or text match.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_click <url> <selector> [--text TEXT]

Parameters:

  • url: URL to navigate to
  • selector: CSS selector for the element (optional if using --text)
  • --text: (Optional) Text to match instead of using selector

Examples:

# Click by selector
python assets/skills/browser-automation/scripts/browser_tools.py browser_click https://example.com "#submit-button"

# Click by text
python assets/skills/browser-automation/scripts/browser_tools.py browser_click https://example.com "button" --text "Submit"

browser_type

Type text into an input field, with optional form submission.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_type <url> <selector> <text> [--submit]

Parameters:

  • url: URL to navigate to
  • selector: CSS selector for the input field
  • text: Text to type
  • --submit: (Optional) Press Enter after typing

Examples:

# Type into field
python assets/skills/browser-automation/scripts/browser_tools.py browser_type https://example.com "#email" "user@example.com"

# Type and submit
python assets/skills/browser-automation/scripts/browser_tools.py browser_type https://example.com "#search" "query" --submit

browser_screenshot

Capture a screenshot of the current page.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot <url> <path> [--full_page]

Parameters:

  • url: URL to navigate to
  • path: Output file path for the screenshot
  • --full_page: (Optional) Capture the entire scrollable page

Examples:

# Viewport screenshot
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/screenshot.png

# Full page screenshot
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/full.png --full_page

browser_get_content

Extract text or HTML content from the page or a specific element.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content <url> [--selector SELECTOR] [--html]

Parameters:

  • url: URL to navigate to
  • --selector: (Optional) CSS selector, defaults to 'body'
  • --html: (Optional) Return HTML instead of text

Examples:

# Get all page text
python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content https://example.com

# Get specific element text
python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content https://example.com --selector "#main-content"

# Get HTML
python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content https://example.com --selector "article" --html

browser_hover

Hover over an element to trigger hover states or tooltips.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_hover <url> <selector>

Parameters:

  • url: URL to navigate to
  • selector: CSS selector for the element

Example:

python assets/skills/browser-automation/scripts/browser_tools.py browser_hover https://example.com ".menu-item"

browser_evaluate

Execute custom JavaScript code in the browser context.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_evaluate <url> <script>

Parameters:

  • url: URL to navigate to
  • script: JavaScript code to execute

Examples:

# Get page title
python assets/skills/browser-automation/scripts/browser_tools.py browser_evaluate https://example.com "document.title"

# Get element count
python assets/skills/browser-automation/scripts/browser_tools.py browser_evaluate https://example.com "document.querySelectorAll('button').length"

# Manipulate DOM
python assets/skills/browser-automation/scripts/browser_tools.py browser_evaluate https://example.com "document.body.style.backgroundColor = 'red'"

Best Practices

  1. Always use full URLs: Include the protocol (http:// or https://)
  2. Wait for content: The tool automatically waits for 'networkidle' state before actions
  3. Use robust selectors: Prefer ID selectors (#id) or specific CSS classes over generic tags
  4. Error handling: All commands exit with non-zero status on failure and print errors to stderr
  5. Headless mode: All operations run in headless Chromium by default for efficiency
  6. Stateless design: Each command runs independently with its own browser instance

Common Patterns

Form Automation

# Fill out a multi-field form
python assets/skills/browser-automation/scripts/browser_tools.py browser_type https://example.com/form "#name" "John Doe"
python assets/skills/browser-automation/scripts/browser_tools.py browser_type https://example.com/form "#email" "john@example.com"
python assets/skills/browser-automation/scripts/browser_tools.py browser_click https://example.com/form "#submit"

Content Extraction

# Extract and save page content
python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content https://example.com --selector "article" > article.txt

Visual Verification

# Capture page state
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/page.png

# Capture full scrollable page
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/full.png --full_page

Testing Interactive UI

# Test hover states
python assets/skills/browser-automation/scripts/browser_tools.py browser_hover https://example.com ".dropdown-trigger"
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/hover-state.png

Architecture

  • Stateless design: Each command launches a new browser instance
  • No persistent sessions: Browser closes after each operation
  • Local execution: All automation runs locally, no remote servers required
  • Simple I/O: Results printed to stdout, errors to stderr
  • Timeout handling: Configurable timeouts for navigation and element operations

Troubleshooting

If you encounter issues:

  1. Install Playwright browsers: Run playwright install chromium
  2. Check Python version: Requires Python 3.8+
  3. Verify URL accessibility: Ensure the target URL is reachable
  4. Inspect selectors: Use browser DevTools to verify CSS selectors
  5. Check for JavaScript errors: Use browser_evaluate to check console logs

Advanced Usage

For more complex automation scenarios that require maintaining state across multiple actions, see the examples directory or consider using Playwright directly in a Python script.

Related Skills

  • webapp-testing: For testing local web applications with server management
  • web-artifacts-builder: For creating web-based UI artifacts

Reference

Source

git clone https://github.com/archubbuck/workspace-architect/blob/main/assets/skills/browser-automation/SKILL.mdView on GitHub

Overview

This skill provides local Python-based browser automation using Playwright, accessible entirely via CLI commands. It supports navigating, clicking, typing, hovering, taking screenshots, extracting content, and executing JavaScript in the browser context to test and automate web pages.

How This Skill Works

Each command launches a new browser instance, performs the requested action, and then closes it. The tools are implemented as subcommands in assets/skills/browser-automation/scripts/browser_tools.py, making them stateless and predictable for automation tasks.

When to Use It

  • Automate interactions with web pages (clicking, typing, navigating)
  • Test web application functionality and UI behavior
  • Extract text or HTML content from pages or specific elements
  • Capture screenshots for verification, reporting, or QA
  • Execute custom JavaScript in the browser context and handle hover states

Quick Start

  1. Step 1: Install Playwright and the required browsers (pip install playwright; playwright install chromium).
  2. Step 2: Navigate to a URL using the CLI tool: python assets/skills/browser-automation/scripts/browser_tools.py browser_navigate https://example.com
  3. Step 3: Interact with the page (e.g., click a button or type into a field) and capture results or a screenshot

Best Practices

  • Install Playwright and the required browser binaries before use (pip install playwright; playwright install chromium).
  • Use precise CSS selectors, or rely on --text for robust element matching when selectors are unstable.
  • Remember each CLI call launches a new browser instance; design flows to minimize unnecessary steps.
  • Combine browser_get_content with targeted selectors to verify content extraction results.
  • Use browser_screenshot with --full_page for long pages or to capture dynamic UI changes

Example Use Cases

  • Navigate to a login page, type credentials into the username and password fields, and click the login button.
  • Open a product page, hover over a menu to reveal a dropdown, and click a sub-item.
  • Capture a full-page screenshot of a dashboard for QA or reporting.
  • Extract the page title and meta description to verify SEO metadata.
  • Execute a small JavaScript snippet in the page context to read a value (e.g., document.querySelector('#id').textContent)

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers