Get the FREE Ultimate OpenClaw Setup Guide →

webscraping-ai-automation

Scanned
npx machina-cli add skill ComposioHQ/awesome-claude-skills/webscraping-ai-automation --openclaw
Files (1)
SKILL.md
3.0 KB

Webscraping AI Automation via Rube MCP

Automate Webscraping AI operations through Composio's Webscraping AI toolkit via Rube MCP.

Toolkit docs: composio.dev/toolkits/webscraping_ai

Prerequisites

  • Rube MCP must be connected (RUBE_SEARCH_TOOLS available)
  • Active Webscraping AI connection via RUBE_MANAGE_CONNECTIONS with toolkit webscraping_ai
  • Always call RUBE_SEARCH_TOOLS first to get current tool schemas

Setup

Get Rube MCP: Add https://rube.app/mcp as an MCP server in your client configuration. No API keys needed — just add the endpoint and it works.

  1. Verify Rube MCP is available by confirming RUBE_SEARCH_TOOLS responds
  2. Call RUBE_MANAGE_CONNECTIONS with toolkit webscraping_ai
  3. If connection is not ACTIVE, follow the returned auth link to complete setup
  4. Confirm connection status shows ACTIVE before running any workflows

Tool Discovery

Always discover available tools before executing workflows:

RUBE_SEARCH_TOOLS
queries: [{use_case: "Webscraping AI operations", known_fields: ""}]
session: {generate_id: true}

This returns available tool slugs, input schemas, recommended execution plans, and known pitfalls.

Core Workflow Pattern

Step 1: Discover Available Tools

RUBE_SEARCH_TOOLS
queries: [{use_case: "your specific Webscraping AI task"}]
session: {id: "existing_session_id"}

Step 2: Check Connection

RUBE_MANAGE_CONNECTIONS
toolkits: ["webscraping_ai"]
session_id: "your_session_id"

Step 3: Execute Tools

RUBE_MULTI_EXECUTE_TOOL
tools: [{
  tool_slug: "TOOL_SLUG_FROM_SEARCH",
  arguments: {/* schema-compliant args from search results */}
}]
memory: {}
session_id: "your_session_id"

Known Pitfalls

  • Always search first: Tool schemas change. Never hardcode tool slugs or arguments without calling RUBE_SEARCH_TOOLS
  • Check connection: Verify RUBE_MANAGE_CONNECTIONS shows ACTIVE status before executing tools
  • Schema compliance: Use exact field names and types from the search results
  • Memory parameter: Always include memory in RUBE_MULTI_EXECUTE_TOOL calls, even if empty ({})
  • Session reuse: Reuse session IDs within a workflow. Generate new ones for new workflows
  • Pagination: Check responses for pagination tokens and continue fetching until complete

Quick Reference

OperationApproach
Find toolsRUBE_SEARCH_TOOLS with Webscraping AI-specific use case
ConnectRUBE_MANAGE_CONNECTIONS with toolkit webscraping_ai
ExecuteRUBE_MULTI_EXECUTE_TOOL with discovered tool slugs
Bulk opsRUBE_REMOTE_WORKBENCH with run_composio_tool()
Full schemaRUBE_GET_TOOL_SCHEMAS for tools with schemaRef

Powered by Composio

Source

git clone https://github.com/ComposioHQ/awesome-claude-skills/blob/master/composio-skills/webscraping-ai-automation/SKILL.mdView on GitHub

Overview

Automate Webscraping AI operations using Composio’s toolkit through Rube MCP. This workflow relies on discovering current tool schemas, managing connections, and executing tools, with a strict rule to search tools first to stay synchronized with updates.

How This Skill Works

Connect to Rube MCP and verify RUBE_SEARCH_TOOLS. Manage the webscraping_ai toolkit connection via RUBE_MANAGE_CONNECTIONS, then discover available tools and execution plans. Execute chosen tools using RUBE_MULTI_EXECUTE_TOOL with the required memory field and a session, reusing sessions when appropriate.

When to Use It

  • Starting a new webscraping automation project and needing up-to-date tool schemas from RUBE_SEARCH_TOOLS
  • Integrating webscraping AI tasks into an existing workflow that requires an ACTIVE webscraping_ai connection
  • Selecting and planning execution after discovering tools and recommended execution plans via RUBE_SEARCH_TOOLS
  • Running a sequence of tool executions within a single session using RUBE_MULTI_EXECUTE_TOOL
  • Handling changes in tool schemas or pagination by re-discovering tools and avoiding hard-coded slugs

Quick Start

  1. Step 1: Add the MCP endpoint https://rube.app/mcp as an MCP server in your client configuration
  2. Step 2: Run RUBE_SEARCH_TOOLS to confirm availability and get current tool schemas for webscraping_ai
  3. Step 3: Run RUBE_MANAGE_CONNECTIONS for toolkit webscraping_ai, then use RUBE_MULTI_EXECUTE_TOOL with a discovered tool slug and memory

Best Practices

  • Always call RUBE_SEARCH_TOOLS before selecting or executing tools; schemas can change
  • Verify RUBE_MANAGE_CONNECTIONS shows ACTIVE before any execution
  • Use exact field names and types from discovery results; include memory in RUBE_MULTI_EXECUTE_TOOL calls
  • Reuse session IDs within a workflow; generate new ones only for separate workflows
  • If results are paginated, handle tokens and fetch all pages to complete the operation

Example Use Cases

  • Example 1: Start a project by running RUBE_SEARCH_TOOLS with use_case 'Webscraping AI operations' to fetch current tool slugs and schemas
  • Example 2: Establish the webscraping_ai connection via RUBE_MANAGE_CONNECTIONS and ensure the session status is ACTIVE
  • Example 3: Discover a specific tool slug from search results and execute it with RUBE_MULTI_EXECUTE_TOOL using the provided input schema and memory
  • Example 4: Perform bulk tool executions through RUBE_REMOTE_WORKBENCH and run_composio_tool() for multiple webscraping tasks
  • Example 5: Retrieve the full tool schemas with RUBE_GET_TOOL_SCHEMAS to verify schemaRef before integration

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers