mcp-web-scrape

🚀 mcp-web-scrape — Clean, cache-aware web content fetcher for AI agents. Fetch any URL → extract readable content → return Markdown/JSON with citations. ⚡ Fast caching, 🤝 robots.txt compliant, 📝 Markdown-ready output, �� works with ChatGPT/Claude Desktop.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio mukul975-mcp-web-scrape npx mcp-web-scrape@1.0.7 \
  --env MCP_WEB_SCRAPE_CACHE_DIR="Directory path for cached content (default or placeholder)" \
  --env MCP_WEB_SCRAPE_RATE_LIMIT="Requests per minute (default or placeholder)" \
  --env MCP_WEB_SCRAPE_USER_AGENT="User-Agent string used for requests (default or placeholder)"

How to use

The MCP Web Scrape server provides a suite of tools to extract, transform, and analyze content from web pages. It converts HTML into clean Markdown with citations, supports robots.txt awareness, and leverages caching via ETag/304 to speed repeated requests. Core extraction tools let you pull page content, metadata, links, images, and structured data, while advanced tools support forms, tables, social profiles, sentiment, entities, and more. You can also generate reports and monitor changes or performance over time. You typically run the server via npx mcp-web-scrape@<version> and then invoke the desired tool commands (for example extract_content, extract_tables, or analyze_competitors) against a URL to obtain structured results suitable for agents and pipelines.

How to install

Prerequisites:

Node.js and npm installed on your machine
Access to npm (public registry)

Step-by-step:

Install the MCP server globally (choose a version): npm install -g mcp-web-scrape@1.0.7
Verify installation and run a quick test (latest version possible): npx mcp-web-scrape@latest
Alternatively, run a specific version directly without global install: npx mcp-web-scrape@1.0.7
Optional: Run the built-in HTTP server for API access (if you need HTTP/SSE support): node dist/http.js --port 3000

Additional notes

Environment variables control caching, user-agent, and rate limiting. Typical variables to set:

MCP_WEB_SCRAPE_CACHE_DIR: path to cache on disk
MCP_WEB_SCRAPE_USER_AGENT: identify your bot to servers
MCP_WEB_SCRAPE_RATE_LIMIT: e.g., 1000 to throttle requests Robots compliance and caching ensure deterministic results and fast responses. If you encounter issues, check that your cache directory exists and that the version in the command matches the installed package. The toolset covers a broad range of extraction and analysis capabilities, so consult the available commands (e.g., extract_content, extract_tables, analyze_competitors) to determine the best workflow for your use case.