Get the FREE Ultimate OpenClaw Setup Guide →

WebScraperToolkit

AI-first web scraping engine with stealth bypass, MCP server, and multimodal output (Markdown, JSON, PDF) for agents and automation.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio imyourboyroy-webscrapertoolkit python -m web_scraper_toolkit.server

How to use

Web Scraper Toolkit exposes an MCP server that lets you control web scraping and browser automation workflows through an agentic MCP interface. The server-side component provides an endpoint to receive commands, start scrapes, manage browser sessions with Playwright, and perform URL ingestion, crawling, extraction, and post-processing tasks. Typical use cases include running a targeted crawl, executing scripted interactions, and obtaining structured outputs (markdown, text, HTML, JSON, CSV, PDFs, etc.) for downstream processing. You can operate the MCP server locally and connect via the MCP protocol or the provided CLI wrappers to issue commands, diagnostics, and orchestration flows.

Once the MCP server is running, you can interact with it using the standard Web Scraper Toolkit commands exposed through MCP. This includes initiating single-page or batch scrapes, controlling browser behavior (headless or headed, with stealth profiles), inspecting and adjusting host-profiles-based routing, and requesting diagnostic analyses. The toolkit also supports a compact interaction-map output for LLM-friendly element discovery and optional accessibility-tree outputs for autonomous navigation. For advanced use, you can leverage remote transport and stdio configurations to integrate the server into broader automation pipelines or orchestration layers.

How to install

Prerequisites:

  • Python 3.8+ (recommended)
  • pip (comes with Python)
  • (Optional) system dependencies for Playwright browsers (e.g., Chromium, Playwright install)
  1. Create and/or activate a Python virtual environment (recommended)
  • On macOS/Linux:
    • python3 -m venv .venv
    • source .venv/bin/activate
  • On Windows:
    • python -m venv .venv
    • ..venv\Scripts\activate
  1. Install the Web Scraper Toolkit package from PyPI
  • pip install web-scraper-toolkit
  1. Install browser automation dependencies (Playwright) and required browsers
  • pip install web-scraper-toolkit[playwright]
  • python -m playwright install
  1. Run the MCP server (as described in the mcp_config section)
  • python -m web_scraper_toolkit.server

Optional: for desktop solver support or additional features, install extra extras if needed, e.g.,

  • pip install web-scraper-toolkit[desktop]
  • python -m playwright install

Note:

  • The exact server module path (web_scraper_toolkit.server) is based on the package layout in this project. If your setup uses a different entry point, adjust the -m argument accordingly.
  • If you encounter environment-variable related configuration, reference the config and environment variable sections in the project docs.

Additional notes

Environment variables to tailor behavior:

  • WST_OS_INPUT_WARNING_SECONDS: Override the OS input warning duration (default 3 seconds).
  • WST_HEADLESS: Force headless mode or switch to headed mode for debugging.
  • WST_HOST_PROFILES_ENABLED / WST_HOST_PROFILES_PATH: Enable and configure host-profile learning and routing behavior.
  • WST_VALIDATION_LEVEL: Adjust diagnostic or strictness levels for MCP flows.

Common issues and tips:

  • Ensure browser binaries are installed via Playwright if using Playwright-backed automation.
  • If the MCP server cannot initialize host profiles, check file permissions and that the host_profiles.json path exists or can be auto-created.
  • When running in headless mode, OS-level mouse interaction is disabled unless explicitly enabled for safety; verify your environment and tests accordingly.
  • Use the diagnostic commands provided by the toolkit to verify connectivity, routing, and extraction capabilities before deploying to production.

Related MCP Servers

Sponsor this space

Reach thousands of developers