WebScraperToolkit
AI-first web scraping engine with stealth bypass, MCP server, and multimodal output (Markdown, JSON, PDF) for agents and automation.
claude mcp add --transport stdio imyourboyroy-webscrapertoolkit python -m web_scraper_toolkit.server
How to use
Web Scraper Toolkit exposes an MCP server that lets you control web scraping and browser automation workflows through an agentic MCP interface. The server-side component provides an endpoint to receive commands, start scrapes, manage browser sessions with Playwright, and perform URL ingestion, crawling, extraction, and post-processing tasks. Typical use cases include running a targeted crawl, executing scripted interactions, and obtaining structured outputs (markdown, text, HTML, JSON, CSV, PDFs, etc.) for downstream processing. You can operate the MCP server locally and connect via the MCP protocol or the provided CLI wrappers to issue commands, diagnostics, and orchestration flows.
Once the MCP server is running, you can interact with it using the standard Web Scraper Toolkit commands exposed through MCP. This includes initiating single-page or batch scrapes, controlling browser behavior (headless or headed, with stealth profiles), inspecting and adjusting host-profiles-based routing, and requesting diagnostic analyses. The toolkit also supports a compact interaction-map output for LLM-friendly element discovery and optional accessibility-tree outputs for autonomous navigation. For advanced use, you can leverage remote transport and stdio configurations to integrate the server into broader automation pipelines or orchestration layers.
How to install
Prerequisites:
- Python 3.8+ (recommended)
- pip (comes with Python)
- (Optional) system dependencies for Playwright browsers (e.g., Chromium, Playwright install)
- Create and/or activate a Python virtual environment (recommended)
- On macOS/Linux:
- python3 -m venv .venv
- source .venv/bin/activate
- On Windows:
- python -m venv .venv
- ..venv\Scripts\activate
- Install the Web Scraper Toolkit package from PyPI
- pip install web-scraper-toolkit
- Install browser automation dependencies (Playwright) and required browsers
- pip install web-scraper-toolkit[playwright]
- python -m playwright install
- Run the MCP server (as described in the mcp_config section)
- python -m web_scraper_toolkit.server
Optional: for desktop solver support or additional features, install extra extras if needed, e.g.,
- pip install web-scraper-toolkit[desktop]
- python -m playwright install
Note:
- The exact server module path (web_scraper_toolkit.server) is based on the package layout in this project. If your setup uses a different entry point, adjust the -m argument accordingly.
- If you encounter environment-variable related configuration, reference the config and environment variable sections in the project docs.
Additional notes
Environment variables to tailor behavior:
- WST_OS_INPUT_WARNING_SECONDS: Override the OS input warning duration (default 3 seconds).
- WST_HEADLESS: Force headless mode or switch to headed mode for debugging.
- WST_HOST_PROFILES_ENABLED / WST_HOST_PROFILES_PATH: Enable and configure host-profile learning and routing behavior.
- WST_VALIDATION_LEVEL: Adjust diagnostic or strictness levels for MCP flows.
Common issues and tips:
- Ensure browser binaries are installed via Playwright if using Playwright-backed automation.
- If the MCP server cannot initialize host profiles, check file permissions and that the host_profiles.json path exists or can be auto-created.
- When running in headless mode, OS-level mouse interaction is disabled unless explicitly enabled for safety; verify your environment and tests accordingly.
- Use the diagnostic commands provided by the toolkit to verify connectivity, routing, and extraction capabilities before deploying to production.
Related MCP Servers
flyto-core
The open-source execution engine for AI agents. 412 modules, MCP-native, triggers, queue, versioning, metering.
mcp
🤖 Taskade MCP · Official MCP server and OpenAPI to MCP codegen. Build AI agent tools from any OpenAPI API and connect to Claude, Cursor, and more.
janee
Secrets management for AI agents via MCP • @janeesecure
IoT-Edge
MCP server for Industrial IoT, SCADA and PLC systems. Unifies MQTT sensors, Modbus devices and industrial equipment into a single AI-orchestrable API. Features real-time monitoring, alarms, time-series storage and actuator control.
protocols-io
An MCP server that enables MCP clients like Claude Desktop to interact with data from protocols.io.
MCP-Manager-GUI
MCP Toggle is a simple GUI tool to help you manage MCP servers across clients seamlessly.