webscraping-ai-automation
Scannednpx machina-cli add skill ComposioHQ/awesome-claude-skills/webscraping-ai-automation --openclawWebscraping AI Automation via Rube MCP
Automate Webscraping AI operations through Composio's Webscraping AI toolkit via Rube MCP.
Toolkit docs: composio.dev/toolkits/webscraping_ai
Prerequisites
- Rube MCP must be connected (RUBE_SEARCH_TOOLS available)
- Active Webscraping AI connection via
RUBE_MANAGE_CONNECTIONSwith toolkitwebscraping_ai - Always call
RUBE_SEARCH_TOOLSfirst to get current tool schemas
Setup
Get Rube MCP: Add https://rube.app/mcp as an MCP server in your client configuration. No API keys needed — just add the endpoint and it works.
- Verify Rube MCP is available by confirming
RUBE_SEARCH_TOOLSresponds - Call
RUBE_MANAGE_CONNECTIONSwith toolkitwebscraping_ai - If connection is not ACTIVE, follow the returned auth link to complete setup
- Confirm connection status shows ACTIVE before running any workflows
Tool Discovery
Always discover available tools before executing workflows:
RUBE_SEARCH_TOOLS
queries: [{use_case: "Webscraping AI operations", known_fields: ""}]
session: {generate_id: true}
This returns available tool slugs, input schemas, recommended execution plans, and known pitfalls.
Core Workflow Pattern
Step 1: Discover Available Tools
RUBE_SEARCH_TOOLS
queries: [{use_case: "your specific Webscraping AI task"}]
session: {id: "existing_session_id"}
Step 2: Check Connection
RUBE_MANAGE_CONNECTIONS
toolkits: ["webscraping_ai"]
session_id: "your_session_id"
Step 3: Execute Tools
RUBE_MULTI_EXECUTE_TOOL
tools: [{
tool_slug: "TOOL_SLUG_FROM_SEARCH",
arguments: {/* schema-compliant args from search results */}
}]
memory: {}
session_id: "your_session_id"
Known Pitfalls
- Always search first: Tool schemas change. Never hardcode tool slugs or arguments without calling
RUBE_SEARCH_TOOLS - Check connection: Verify
RUBE_MANAGE_CONNECTIONSshows ACTIVE status before executing tools - Schema compliance: Use exact field names and types from the search results
- Memory parameter: Always include
memoryinRUBE_MULTI_EXECUTE_TOOLcalls, even if empty ({}) - Session reuse: Reuse session IDs within a workflow. Generate new ones for new workflows
- Pagination: Check responses for pagination tokens and continue fetching until complete
Quick Reference
| Operation | Approach |
|---|---|
| Find tools | RUBE_SEARCH_TOOLS with Webscraping AI-specific use case |
| Connect | RUBE_MANAGE_CONNECTIONS with toolkit webscraping_ai |
| Execute | RUBE_MULTI_EXECUTE_TOOL with discovered tool slugs |
| Bulk ops | RUBE_REMOTE_WORKBENCH with run_composio_tool() |
| Full schema | RUBE_GET_TOOL_SCHEMAS for tools with schemaRef |
Powered by Composio
Source
git clone https://github.com/ComposioHQ/awesome-claude-skills/blob/master/composio-skills/webscraping-ai-automation/SKILL.mdView on GitHub Overview
Automate Webscraping AI operations using Composio’s toolkit through Rube MCP. This workflow relies on discovering current tool schemas, managing connections, and executing tools, with a strict rule to search tools first to stay synchronized with updates.
How This Skill Works
Connect to Rube MCP and verify RUBE_SEARCH_TOOLS. Manage the webscraping_ai toolkit connection via RUBE_MANAGE_CONNECTIONS, then discover available tools and execution plans. Execute chosen tools using RUBE_MULTI_EXECUTE_TOOL with the required memory field and a session, reusing sessions when appropriate.
When to Use It
- Starting a new webscraping automation project and needing up-to-date tool schemas from RUBE_SEARCH_TOOLS
- Integrating webscraping AI tasks into an existing workflow that requires an ACTIVE webscraping_ai connection
- Selecting and planning execution after discovering tools and recommended execution plans via RUBE_SEARCH_TOOLS
- Running a sequence of tool executions within a single session using RUBE_MULTI_EXECUTE_TOOL
- Handling changes in tool schemas or pagination by re-discovering tools and avoiding hard-coded slugs
Quick Start
- Step 1: Add the MCP endpoint https://rube.app/mcp as an MCP server in your client configuration
- Step 2: Run RUBE_SEARCH_TOOLS to confirm availability and get current tool schemas for webscraping_ai
- Step 3: Run RUBE_MANAGE_CONNECTIONS for toolkit webscraping_ai, then use RUBE_MULTI_EXECUTE_TOOL with a discovered tool slug and memory
Best Practices
- Always call RUBE_SEARCH_TOOLS before selecting or executing tools; schemas can change
- Verify RUBE_MANAGE_CONNECTIONS shows ACTIVE before any execution
- Use exact field names and types from discovery results; include memory in RUBE_MULTI_EXECUTE_TOOL calls
- Reuse session IDs within a workflow; generate new ones only for separate workflows
- If results are paginated, handle tokens and fetch all pages to complete the operation
Example Use Cases
- Example 1: Start a project by running RUBE_SEARCH_TOOLS with use_case 'Webscraping AI operations' to fetch current tool slugs and schemas
- Example 2: Establish the webscraping_ai connection via RUBE_MANAGE_CONNECTIONS and ensure the session status is ACTIVE
- Example 3: Discover a specific tool slug from search results and execute it with RUBE_MULTI_EXECUTE_TOOL using the provided input schema and memory
- Example 4: Perform bulk tool executions through RUBE_REMOTE_WORKBENCH and run_composio_tool() for multiple webscraping tasks
- Example 5: Retrieve the full tool schemas with RUBE_GET_TOOL_SCHEMAS to verify schemaRef before integration