What is AIPex Browser Control?

AIPex Browser Control lets your agent drive Chrome using the MCP bridge and the AIPex extension, enabling navigation, interaction, data capture, and downloads.

What do I need to use this skill?

Install the AIPex Chrome extension and run Node.js >= 18 to host the MCP bridge; configure your MCP server as described.

How do I verify the connection is working?

After Step 2, check for the presence of browser tools like check_aipex_connection. If not visible, re-run the connection steps and reload the MCP server in your agent settings.

aipex-browser

Scanned

npx machina-cli add skill AIPexStudio/AIPex/skill --openclaw

Files (1)

SKILL.md

7.1 KB

AIPex Browser Control

AIPex is a Chrome extension that exposes 30+ browser automation tools over the Model Context Protocol (MCP). Once connected, the agent can control any Chrome tab using natural language — clicking, typing, navigating, capturing screenshots, downloading content, and more.

Architecture:

Agent (MCP client) ──stdio──▶ aipex-mcp-bridge ──WebSocket──▶ AIPex Chrome Extension ──▶ Browser APIs

When to Use This Skill

Use this skill when the user wants to:

Navigate to URLs, click links, fill forms, or interact with any web page
Automate multi-step browser workflows
Extract or download data from web pages
Capture screenshots of browser tabs
Manage multiple tabs across browser windows
Perform browser-assisted testing (accessibility, UX, regression)

Prerequisites

AIPex Chrome extension installed (available on the Chrome Web Store or via developer build)
Node.js >= 18 installed on the local machine

The user is assumed to have AIPex installed. The agent only needs to complete the two connection steps below.

Step 1: Register the MCP Server

Add the following to the agent's MCP configuration. No manual installation is needed — npx downloads and runs aipex-mcp-bridge automatically.

Cursor (`.cursor/mcp.json`)

{
  "mcpServers": {
    "aipex-browser": {
      "command": "npx",
      "args": ["-y", "aipex-mcp-bridge"]
    }
  }
}

Claude Desktop (`claude_desktop_config.json`)

{
  "mcpServers": {
    "aipex-browser": {
      "command": "npx",
      "args": ["-y", "aipex-mcp-bridge"]
    }
  }
}

Claude Code (CLI)

claude mcp add aipex-browser -- npx -y aipex-mcp-bridge

VS Code Copilot (`.vscode/mcp.json`)

{
  "servers": {
    "aipex-browser": {
      "command": "npx",
      "args": ["-y", "aipex-mcp-bridge"]
    }
  }
}

Windsurf (`mcp_config.json`)

{
  "mcpServers": {
    "aipex-browser": {
      "command": "npx",
      "args": ["-y", "aipex-mcp-bridge"]
    }
  }
}

Custom port (optional)

The bridge listens on localhost:9223 by default. To use a different port:

{
  "mcpServers": {
    "aipex-browser": {
      "command": "npx",
      "args": ["-y", "aipex-mcp-bridge", "--port", "9224"]
    }
  }
}

Then use ws://localhost:9224 in Step 2.

Step 2: Connect the AIPex Extension to the Bridge

After the MCP server is registered and running:

Open Chrome and click the AIPex extension icon
Go to Options (or right-click the icon → "Extension options")
Find the WebSocket Connection section
Enter: ws://localhost:9223
Click Connect

The bridge and extension will handshake, and all browser tools will become available to the agent.

Verifying the connection: If only a single tool called check_aipex_connection is visible, the extension has not yet connected. Follow Step 2 again, then reload the MCP server in agent settings.

Tool Usage Strategy (IMPORTANT)

Always follow this priority order to minimize token cost and latency:

Priority 1 — `search_elements` (always try first)

Query the page's accessibility tree to find elements and get their UIDs. Fast, cheap, requires no screenshot.

search_elements(tabId, "{button,input,textarea,select,a}*")

Priority 2 — UID-based interaction (preferred)

Use UIDs returned by search_elements to interact directly:

click(tabId, uid) — click any element
fill_element_by_uid(tabId, uid, value) — type into inputs
hover_element_by_uid(tabId, uid) — reveal menus or tooltips

Priority 3 — `capture_screenshot` + `computer` (high-cost fallback only)

Use only when search_elements fails after two different query attempts, or when pixel-level interaction is required (canvas, drag-and-drop, sliders).

capture_screenshot(sendToLLM=true) — see the page
computer(action, coordinate) — click/type at pixel coordinates

Standard Workflow

get_all_tabs()
  → search_elements(tabId, "<pattern>")
  → click(tabId, uid)  OR  fill_element_by_uid(tabId, uid, value)
  → [capture_screenshot(sendToLLM=true) to verify if needed]

Available Tool Categories

Category	Tools	Description
Tab Management	8 tools	Open, close, switch, pin, group tabs
UI Interaction	7 tools	Click, fill, hover, keyboard, coordinate-based
Page Content	4 tools	Metadata, scroll, highlight elements/text
Screenshots	2 tools	Capture visible tab or specific tab
Downloads	3 tools	Save text as markdown, download images
Human Intervention	4 tools	Request user input mid-automation

Key tools by category:

Category	Key Tools
Tab	`get_all_tabs`, `switch_to_tab`, `create_new_tab`, `close_tab`
UI	`search_elements`, `click`, `fill_element_by_uid`, `computer`
Page	`get_page_metadata`, `scroll_to_element`, `highlight_element`
Screenshot	`capture_screenshot`, `capture_tab_screenshot`
Download	`download_text_as_markdown`, `download_image`
Intervention	`request_intervention`, `list_interventions`

To load complete parameter schemas and examples for every tool:

read_skill_reference("aipex-browser", "references/tools-reference.md")

Common Patterns

Navigate to a URL and click a button

create_new_tab("https://example.com")
→ search_elements(tabId, "*[Ss]ubmit*")
→ click(tabId, uid)

Fill a login form

get_all_tabs()
→ search_elements(tabId, "{input,textbox}*")
→ fill_element_by_uid(tabId, emailUid, "user@example.com")
→ fill_element_by_uid(tabId, passwordUid, "secret")
→ search_elements(tabId, "*[Ll]ogin*")
→ click(tabId, uid)

Extract page content to markdown

get_page_metadata()
→ download_text_as_markdown(content, "page-extract")

Visual verification

capture_screenshot(sendToLLM=true)

Troubleshooting

Symptom	Likely Cause	Fix
Only `check_aipex_connection` visible	Extension not connected to bridge	Open AIPex Options → set WebSocket URL → Connect
Port 9223 already in use	Port conflict on machine	Use `--port 9224` in MCP config and `ws://localhost:9224` in extension
`search_elements` returns 0 results	Page uses canvas or non-semantic HTML	Fall back to `capture_screenshot(sendToLLM=true)` + `computer` tool
Connection drops frequently	Service worker sleep cycle	AIPex uses keepalive pings; reconnect extension from Options if needed
Tools appear but calls time out	Bridge not receiving WebSocket messages	Restart bridge: reload MCP server in agent settings

Source

git clone https://github.com/AIPexStudio/AIPex/blob/main/skill/SKILL.mdView on GitHub

Overview

AIPex Browser Control lets an agent drive a Chrome browser through the AIPex Chrome extension using the MCP bridge. It supports navigating pages, clicking, filling forms, taking screenshots, managing tabs, and downloading content.

How This Skill Works

The agent acts as an MCP client and communicates with the aipex-mcp-bridge over stdio. The bridge talks to the AIPex Chrome Extension over WebSocket, which then invokes Chrome browser APIs to perform actions like navigation, element interaction, and data capture. The system exposes 30+ browser automation tools and uses a priority workflow (search_elements first, then UID-based interactions) to minimize cost and latency.

When to Use It

Navigate to URLs, click links, fill forms, or interact with any web page
Automate multi-step browser workflows
Extract or download data from web pages
Capture screenshots of browser tabs
Manage multiple tabs across browser windows and perform browser-assisted testing

Quick Start

Step 1: Install and run the MCP bridge using the provided npx commands
Step 2: Connect the AIPex extension to the bridge with ws://localhost:9223 (or your port)
Step 3: Start automating with commands like search_elements, click, fill, navigate, and take_screenshot

Best Practices

Verify AIPex Chrome extension is installed and accessible before starting
Prefer search_elements to find elements and obtain UIDs first
Use UID-based actions (click, fill) once you have element UIDs
Batch actions to minimize browser tab switching and latency
Check connection status (check_aipex_connection) if tools don't appear and reload MCP server

Example Use Cases

Navigate to a product page, click add-to-cart, fill checkout details, and download a confirmation receipt
Run accessibility or regression tests by navigating pages and capturing screenshots for comparisons
Scrape data from a catalog by locating items, extracting text, and downloading a CSV
Open multiple research pages in different tabs and compare information side-by-side
Fill and submit a form on a site, then verify the success message in the browser

Frequently Asked Questions

Add this skill to your agents