webustler
MCP server for web scraping that actually works. Extracts clean, LLM-ready markdown from any URL — even Cloudflare-protected sites.
claude mcp add --transport stdio drruin-webustler docker run -i --rm webustler
How to use
Webustler is a self-hosted MCP server designed to extract clean, markdown-formatted content from any URL, even those protected by anti-bot measures like Cloudflare. It outputs rich metadata, preserves tables, and filters out noise to provide a clean, model-ready markdown payload. You can issue the built-in scrape commands through your MCP client to fetch article content, links, and metadata, with automatic retry and anti-bot fallback when needed. The server is intended to be run via Docker and relies on the webustler image; once running, you can invoke commands such as Scrape <URL> to obtain a polished markdown document with YAML frontmatter detailing source, metadata, and link counts. It’s especially useful for building pipelines that need reliable extraction from diverse web sources without API keys or per-site quotas.
How to install
Prerequisites:
- Docker installed and running on your host
- Basic familiarity with MCP client usage (Claude, Cursor, Windsurf, etc.)
Install steps:
-
Clone the repository and build the Docker image (if you have the source): git clone https://github.com/drruin/webustler.git cd webustler docker build -t webustler .
-
Run the Webustler container (detached or interactive as needed): docker run -i --rm webustler
-
Configure MCP clients to target the Webustler MCP server using the provided mcp_config example (see below). If you’re using prebuilt images, you can skip the build step and pull the image instead: docker pull webustler
-
Validate by sending a test request via your MCP client (e.g., Scrape https://example.com) and verify the Markdown output is returned with YAML frontmatter and the expected fields.
Additional notes
Tips:
- TIMEOUT environment variable can be passed to control request timeouts (e.g., -e TIMEOUT=180 in Docker run args).
- The server automatically handles Cloudflare and anti-bot challenges, with a retry/fallback mechanism.
- Output includes sourceURL, statusCode, title, description, author, language, wordCount, readingTime, publishedTime, openGraph, twitter, internalLinksCount, externalLinksCount, and imagesCount.
- If you plan to run in production, consider mounting a persistent volume for logs and outputs and setting a stable image tag.
- Ensure network access from the MCP host to the container, and verify that the webustler image exposes or produces the expected output format to your MCP client.
Related MCP Servers
penpot
Penpot MCP server
Remote
A type-safe solution to remote MCP communication, enabling effortless integration for centralized management of Model Context.
git
An MCP (Model Context Protocol) server enabling LLMs and AI agents to interact with Git repositories. Provides tools for comprehensive Git operations including clone, commit, branch, diff, log, status, push, pull, merge, rebase, worktree, tag management, and more, via the MCP standard. STDIO & HTTP.
mcp -odoo
A Model Context Protocol (MCP) server that enables AI assistants to securely interact with Odoo ERP systems through standardized resources and tools for data retrieval and manipulation.
boilerplate
TypeScript Model Context Protocol (MCP) server boilerplate providing IP lookup tools/resources. Includes CLI support and extensible structure for connecting AI systems (LLMs) to external data sources like ip-api.com. Ideal template for creating new MCP integrations via Node.js.
local-history
MCP server for accessing VS Code/Cursor's Local History