webustler

MCP server for web scraping that actually works. Extracts clean, LLM-ready markdown from any URL — even Cloudflare-protected sites.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio drruin-webustler docker run -i --rm webustler

How to use

Webustler is a self-hosted MCP server designed to extract clean, markdown-formatted content from any URL, even those protected by anti-bot measures like Cloudflare. It outputs rich metadata, preserves tables, and filters out noise to provide a clean, model-ready markdown payload. You can issue the built-in scrape commands through your MCP client to fetch article content, links, and metadata, with automatic retry and anti-bot fallback when needed. The server is intended to be run via Docker and relies on the webustler image; once running, you can invoke commands such as Scrape <URL> to obtain a polished markdown document with YAML frontmatter detailing source, metadata, and link counts. It’s especially useful for building pipelines that need reliable extraction from diverse web sources without API keys or per-site quotas.

How to install

Prerequisites:

Docker installed and running on your host
Basic familiarity with MCP client usage (Claude, Cursor, Windsurf, etc.)

Install steps:

Clone the repository and build the Docker image (if you have the source): git clone https://github.com/drruin/webustler.git cd webustler docker build -t webustler .
Run the Webustler container (detached or interactive as needed): docker run -i --rm webustler
Configure MCP clients to target the Webustler MCP server using the provided mcp_config example (see below). If you’re using prebuilt images, you can skip the build step and pull the image instead: docker pull webustler
Validate by sending a test request via your MCP client (e.g., Scrape https://example.com) and verify the Markdown output is returned with YAML frontmatter and the expected fields.

Additional notes

Tips:

TIMEOUT environment variable can be passed to control request timeouts (e.g., -e TIMEOUT=180 in Docker run args).
The server automatically handles Cloudflare and anti-bot challenges, with a retry/fallback mechanism.
Output includes sourceURL, statusCode, title, description, author, language, wordCount, readingTime, publishedTime, openGraph, twitter, internalLinksCount, externalLinksCount, and imagesCount.
If you plan to run in production, consider mounting a persistent volume for logs and outputs and setting a stable image tag.
Ensure network access from the MCP host to the container, and verify that the webustler image exposes or produces the expected output format to your MCP client.

Related MCP Servers

penpot

222

Penpot MCP server

Remote

206

A type-safe solution to remote MCP communication, enabling effortless integration for centralized management of Model Context.

git

185

An MCP (Model Context Protocol) server enabling LLMs and AI agents to interact with Git repositories. Provides tools for comprehensive Git operations including clone, commit, branch, diff, log, status, push, pull, merge, rebase, worktree, tag management, and more, via the MCP standard. STDIO & HTTP.

mcp -odoo

170

A Model Context Protocol (MCP) server that enables AI assistants to securely interact with Odoo ERP systems through standardized resources and tools for data retrieval and manipulation.

boilerplate

TypeScript Model Context Protocol (MCP) server boilerplate providing IP lookup tools/resources. Includes CLI support and extensible structure for connecting AI systems (LLMs) to external data sources like ip-api.com. Ideal template for creating new MCP integrations via Node.js.

local-history

MCP server for accessing VS Code/Cursor's Local History