What is browser-automation used for?

A local Python-based toolkit that uses Playwright to automate browser interactions via CLI, including navigation, clicking, typing, hovering, taking screenshots, extracting content, and running JavaScript.

How do I run a simple navigation or click action?

Use the browser_navigate command to open a URL, and browser_click to interact with elements by selector or text, for example: python assets/.../browser_tools.py browser_navigate https://example.com and python assets/.../browser_tools.py browser_click https://example.com '#submit-button' --text 'Submit'.

Does each command run in isolation or can I reuse a session?

Each tool invocation launches a new browser instance and closes it, providing stateless actions that are easy to script and parallelize.

browser-automation

npx machina-cli add skill archubbuck/workspace-architect/browser-automation --openclaw

Files (1)

SKILL.md

8.7 KB

Browser Automation Skill

This skill provides local browser automation capabilities using Python and Playwright. All browser automation is performed locally via CLI commands.

When to Use This Skill

Use this skill when you need to:

Automate interactions with web pages (clicking, typing, navigating)
Test web application functionality
Extract content or data from web pages
Take screenshots of web pages
Execute custom JavaScript in browser context
Hover over elements to trigger UI states

Prerequisites

Before using this skill, ensure Playwright is installed:

pip install playwright
playwright install chromium

Available Tools

All tools are implemented as subcommands in assets/skills/browser-automation/scripts/browser_tools.py. Each command is stateless - it launches a new browser instance, performs the action, and closes the browser.

browser_navigate

Navigate to a URL and wait for the page to load.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_navigate <url>

Example:

python assets/skills/browser-automation/scripts/browser_tools.py browser_navigate https://example.com

browser_click

Click an element on a page using a CSS selector or text match.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_click <url> <selector> [--text TEXT]

Parameters:

url: URL to navigate to
selector: CSS selector for the element (optional if using --text)
--text: (Optional) Text to match instead of using selector

Examples:

# Click by selector
python assets/skills/browser-automation/scripts/browser_tools.py browser_click https://example.com "#submit-button"

# Click by text
python assets/skills/browser-automation/scripts/browser_tools.py browser_click https://example.com "button" --text "Submit"

browser_type

Type text into an input field, with optional form submission.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_type <url> <selector> <text> [--submit]

Parameters:

url: URL to navigate to
selector: CSS selector for the input field
text: Text to type
--submit: (Optional) Press Enter after typing

Examples:

# Type into field
python assets/skills/browser-automation/scripts/browser_tools.py browser_type https://example.com "#email" "user@example.com"

# Type and submit
python assets/skills/browser-automation/scripts/browser_tools.py browser_type https://example.com "#search" "query" --submit

browser_screenshot

Capture a screenshot of the current page.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot <url> <path> [--full_page]

Parameters:

url: URL to navigate to
path: Output file path for the screenshot
--full_page: (Optional) Capture the entire scrollable page

Examples:

# Viewport screenshot
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/screenshot.png

# Full page screenshot
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/full.png --full_page

browser_get_content

Extract text or HTML content from the page or a specific element.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content <url> [--selector SELECTOR] [--html]

Parameters:

url: URL to navigate to
--selector: (Optional) CSS selector, defaults to 'body'
--html: (Optional) Return HTML instead of text

Examples:

# Get all page text
python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content https://example.com

# Get specific element text
python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content https://example.com --selector "#main-content"

# Get HTML
python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content https://example.com --selector "article" --html

browser_hover

Hover over an element to trigger hover states or tooltips.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_hover <url> <selector>

Parameters:

url: URL to navigate to
selector: CSS selector for the element

Example:

python assets/skills/browser-automation/scripts/browser_tools.py browser_hover https://example.com ".menu-item"

browser_evaluate

Execute custom JavaScript code in the browser context.

Usage:

python assets/skills/browser-automation/scripts/browser_tools.py browser_evaluate <url> <script>

Parameters:

url: URL to navigate to
script: JavaScript code to execute

Examples:

# Get page title
python assets/skills/browser-automation/scripts/browser_tools.py browser_evaluate https://example.com "document.title"

# Get element count
python assets/skills/browser-automation/scripts/browser_tools.py browser_evaluate https://example.com "document.querySelectorAll('button').length"

# Manipulate DOM
python assets/skills/browser-automation/scripts/browser_tools.py browser_evaluate https://example.com "document.body.style.backgroundColor = 'red'"

Best Practices

Always use full URLs: Include the protocol (http:// or https://)
Wait for content: The tool automatically waits for 'networkidle' state before actions
Use robust selectors: Prefer ID selectors (#id) or specific CSS classes over generic tags
Error handling: All commands exit with non-zero status on failure and print errors to stderr
Headless mode: All operations run in headless Chromium by default for efficiency
Stateless design: Each command runs independently with its own browser instance

Common Patterns

Form Automation

# Fill out a multi-field form
python assets/skills/browser-automation/scripts/browser_tools.py browser_type https://example.com/form "#name" "John Doe"
python assets/skills/browser-automation/scripts/browser_tools.py browser_type https://example.com/form "#email" "john@example.com"
python assets/skills/browser-automation/scripts/browser_tools.py browser_click https://example.com/form "#submit"

Content Extraction

# Extract and save page content
python assets/skills/browser-automation/scripts/browser_tools.py browser_get_content https://example.com --selector "article" > article.txt

Visual Verification

# Capture page state
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/page.png

# Capture full scrollable page
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/full.png --full_page

Testing Interactive UI

# Test hover states
python assets/skills/browser-automation/scripts/browser_tools.py browser_hover https://example.com ".dropdown-trigger"
python assets/skills/browser-automation/scripts/browser_tools.py browser_screenshot https://example.com /tmp/hover-state.png

Architecture

Stateless design: Each command launches a new browser instance
No persistent sessions: Browser closes after each operation
Local execution: All automation runs locally, no remote servers required
Simple I/O: Results printed to stdout, errors to stderr
Timeout handling: Configurable timeouts for navigation and element operations

Troubleshooting

If you encounter issues:

Install Playwright browsers: Run playwright install chromium
Check Python version: Requires Python 3.8+
Verify URL accessibility: Ensure the target URL is reachable
Inspect selectors: Use browser DevTools to verify CSS selectors
Check for JavaScript errors: Use browser_evaluate to check console logs

Advanced Usage

For more complex automation scenarios that require maintaining state across multiple actions, see the examples directory or consider using Playwright directly in a Python script.

Related Skills

webapp-testing: For testing local web applications with server management
web-artifacts-builder: For creating web-based UI artifacts

Reference

Browser tools source: scripts/browser_tools.py
Playwright Documentation: https://playwright.dev/python/
Examples: examples/

Source

git clone https://github.com/archubbuck/workspace-architect/blob/main/assets/skills/browser-automation/SKILL.mdView on GitHub

Overview

This skill provides local Python-based browser automation using Playwright, accessible entirely via CLI commands. It supports navigating, clicking, typing, hovering, taking screenshots, extracting content, and executing JavaScript in the browser context to test and automate web pages.

How This Skill Works

Each command launches a new browser instance, performs the requested action, and then closes it. The tools are implemented as subcommands in assets/skills/browser-automation/scripts/browser_tools.py, making them stateless and predictable for automation tasks.

When to Use It

Automate interactions with web pages (clicking, typing, navigating)
Test web application functionality and UI behavior
Extract text or HTML content from pages or specific elements
Capture screenshots for verification, reporting, or QA
Execute custom JavaScript in the browser context and handle hover states

Quick Start

Step 1: Install Playwright and the required browsers (pip install playwright; playwright install chromium).
Step 2: Navigate to a URL using the CLI tool: python assets/skills/browser-automation/scripts/browser_tools.py browser_navigate https://example.com
Step 3: Interact with the page (e.g., click a button or type into a field) and capture results or a screenshot

Best Practices

Install Playwright and the required browser binaries before use (pip install playwright; playwright install chromium).
Use precise CSS selectors, or rely on --text for robust element matching when selectors are unstable.
Remember each CLI call launches a new browser instance; design flows to minimize unnecessary steps.
Combine browser_get_content with targeted selectors to verify content extraction results.
Use browser_screenshot with --full_page for long pages or to capture dynamic UI changes

Example Use Cases

Navigate to a login page, type credentials into the username and password fields, and click the login button.
Open a product page, hover over a menu to reveal a dropdown, and click a sub-item.
Capture a full-page screenshot of a dashboard for QA or reporting.
Extract the page title and meta description to verify SEO metadata.
Execute a small JavaScript snippet in the page context to read a value (e.g., document.querySelector('#id').textContent)

Frequently Asked Questions

Add this skill to your agents