What is required to start using this skill?

You need a connected Rube MCP, access to RUBE_SEARCH_TOOLS, and an ACTIVE webscraping_ai connection via RUBE_MANAGE_CONNECTIONS.

Do I need API keys for MCP?

No API keys are required; you add the MCP endpoint (https://rube.app/mcp) and authentication is not needed.

How do I handle changes in tool schemas?

Always perform RUBE_SEARCH_TOOLS first to fetch current schemas, avoid hardcoding slugs/arguments, and include memory in all RUBE_MULTI_EXECUTE_TOOL calls; reuse sessions when possible.

webscraping-ai-automation

Scanned

npx machina-cli add skill ComposioHQ/awesome-claude-skills/webscraping-ai-automation --openclaw

Files (1)

SKILL.md

3.0 KB

Webscraping AI Automation via Rube MCP

Automate Webscraping AI operations through Composio's Webscraping AI toolkit via Rube MCP.

Toolkit docs: composio.dev/toolkits/webscraping_ai

Prerequisites

Rube MCP must be connected (RUBE_SEARCH_TOOLS available)
Active Webscraping AI connection via RUBE_MANAGE_CONNECTIONS with toolkit webscraping_ai
Always call RUBE_SEARCH_TOOLS first to get current tool schemas

Setup

Get Rube MCP: Add https://rube.app/mcp as an MCP server in your client configuration. No API keys needed — just add the endpoint and it works.

Verify Rube MCP is available by confirming RUBE_SEARCH_TOOLS responds
Call RUBE_MANAGE_CONNECTIONS with toolkit webscraping_ai
If connection is not ACTIVE, follow the returned auth link to complete setup
Confirm connection status shows ACTIVE before running any workflows

Tool Discovery

Always discover available tools before executing workflows:

RUBE_SEARCH_TOOLS
queries: [{use_case: "Webscraping AI operations", known_fields: ""}]
session: {generate_id: true}

This returns available tool slugs, input schemas, recommended execution plans, and known pitfalls.

Core Workflow Pattern

Step 1: Discover Available Tools

RUBE_SEARCH_TOOLS
queries: [{use_case: "your specific Webscraping AI task"}]
session: {id: "existing_session_id"}

Step 2: Check Connection

RUBE_MANAGE_CONNECTIONS
toolkits: ["webscraping_ai"]
session_id: "your_session_id"

Step 3: Execute Tools

RUBE_MULTI_EXECUTE_TOOL
tools: [{
  tool_slug: "TOOL_SLUG_FROM_SEARCH",
  arguments: {/* schema-compliant args from search results */}
}]
memory: {}
session_id: "your_session_id"

Known Pitfalls

Always search first: Tool schemas change. Never hardcode tool slugs or arguments without calling RUBE_SEARCH_TOOLS
Check connection: Verify RUBE_MANAGE_CONNECTIONS shows ACTIVE status before executing tools
Schema compliance: Use exact field names and types from the search results
Memory parameter: Always include memory in RUBE_MULTI_EXECUTE_TOOL calls, even if empty ({})
Session reuse: Reuse session IDs within a workflow. Generate new ones for new workflows
Pagination: Check responses for pagination tokens and continue fetching until complete

Quick Reference

Operation	Approach
Find tools	`RUBE_SEARCH_TOOLS` with Webscraping AI-specific use case
Connect	`RUBE_MANAGE_CONNECTIONS` with toolkit `webscraping_ai`
Execute	`RUBE_MULTI_EXECUTE_TOOL` with discovered tool slugs
Bulk ops	`RUBE_REMOTE_WORKBENCH` with `run_composio_tool()`
Full schema	`RUBE_GET_TOOL_SCHEMAS` for tools with `schemaRef`

Powered by Composio

Source

git clone https://github.com/ComposioHQ/awesome-claude-skills/blob/master/composio-skills/webscraping-ai-automation/SKILL.md

View on GitHub

Overview

Automate Webscraping AI operations using Composio’s toolkit through Rube MCP. This workflow relies on discovering current tool schemas, managing connections, and executing tools, with a strict rule to search tools first to stay synchronized with updates.

How This Skill Works

Connect to Rube MCP and verify RUBE_SEARCH_TOOLS. Manage the webscraping_ai toolkit connection via RUBE_MANAGE_CONNECTIONS, then discover available tools and execution plans. Execute chosen tools using RUBE_MULTI_EXECUTE_TOOL with the required memory field and a session, reusing sessions when appropriate.

When to Use It

Starting a new webscraping automation project and needing up-to-date tool schemas from RUBE_SEARCH_TOOLS
Integrating webscraping AI tasks into an existing workflow that requires an ACTIVE webscraping_ai connection
Selecting and planning execution after discovering tools and recommended execution plans via RUBE_SEARCH_TOOLS
Running a sequence of tool executions within a single session using RUBE_MULTI_EXECUTE_TOOL
Handling changes in tool schemas or pagination by re-discovering tools and avoiding hard-coded slugs

Quick Start

Step 1: Add the MCP endpoint https://rube.app/mcp as an MCP server in your client configuration
Step 2: Run RUBE_SEARCH_TOOLS to confirm availability and get current tool schemas for webscraping_ai
Step 3: Run RUBE_MANAGE_CONNECTIONS for toolkit webscraping_ai, then use RUBE_MULTI_EXECUTE_TOOL with a discovered tool slug and memory

Best Practices

Always call RUBE_SEARCH_TOOLS before selecting or executing tools; schemas can change
Verify RUBE_MANAGE_CONNECTIONS shows ACTIVE before any execution
Use exact field names and types from discovery results; include memory in RUBE_MULTI_EXECUTE_TOOL calls
Reuse session IDs within a workflow; generate new ones only for separate workflows
If results are paginated, handle tokens and fetch all pages to complete the operation

Example Use Cases

Example 1: Start a project by running RUBE_SEARCH_TOOLS with use_case 'Webscraping AI operations' to fetch current tool slugs and schemas
Example 2: Establish the webscraping_ai connection via RUBE_MANAGE_CONNECTIONS and ensure the session status is ACTIVE
Example 3: Discover a specific tool slug from search results and execute it with RUBE_MULTI_EXECUTE_TOOL using the provided input schema and memory
Example 4: Perform bulk tool executions through RUBE_REMOTE_WORKBENCH and run_composio_tool() for multiple webscraping tasks
Example 5: Retrieve the full tool schemas with RUBE_GET_TOOL_SCHEMAS to verify schemaRef before integration

Frequently Asked Questions

Add this skill to your agents