scraper

Context-optimized MCP server for web scraping. Reduces LLM token usage by 70-90% through server-side CSS filtering and HTML-to-markdown conversion.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio cotdp-scraper-mcp docker run -d -p 8000:8000 --name scraper-mcp ghcr.io/cotdp/scraper-mcp:latest \
  --env HTTP_PROXY="" \
  --env HTTPS_PROXY="" \
  --env ENABLE_PROMPTS="true" \
  --env ENABLE_RESOURCES="true" \
  --env SCRAPEOPS_API_KEY="" \
  --env PERPLEXITY_API_KEY="your_key_here" \
  --env PLAYWRIGHT_TIMEOUT="30000" \
  --env SCRAPEOPS_RENDER_JS="true" \
  --env PLAYWRIGHT_DISABLE_GPU="true" \
  --env PLAYWRIGHT_MAX_CONTEXTS="5"

How to use

This MCP server, Scraper, provides context-optimized web scraping tooling that reduces token usage for language models by performing server-side HTML filtering, Markdown conversion, and CSS selector targeting. It exposes a suite of tools to fetch and process web content, including raw HTML, markdown, plain text, and link extraction, with optional JavaScript rendering via Playwright for dynamic pages. You can also perform AI-assisted web search and reasoning through the Perplexity tools integrated into the server, and you can monitor statistics and test APIs via the built-in dashboard. To use it, run the server (preferably via Docker as outlined in the deployment instructions), then interact with the MCP endpoint at /mcp and use the provided tools such as scrape_url, scrape_url_html, scrape_url_text, scrape_extract_links, perplexity, and perplexity_reason. The tools support single URLs or batch inputs, optional CSS selectors, and optional render_js to render dynamic content before extraction.

How to install

Prerequisites:

Docker installed on your host (recommended for this MCP server).
Access to the internet to pull the Docker image from GHCR.

Installation steps:

Install Docker: follow the official installation guide for your OS (https://docs.docker.com/get-docker/).
Pull the latest image (optional if you deploy via docker-compose but useful for quick start): docker pull ghcr.io/cotdp/scraper-mcp:latest
Run the MCP server with Docker (example): docker run -d -p 8000:8000 --name scraper-mcp ghcr.io/cotdp/scraper-mcp:latest
Verify the MCP endpoint:
- MCP: http://localhost:8000/mcp
- Dashboard: http://localhost:8000/
If you need Perplexity integration, export the API key: export PERPLEXITY_API_KEY=your_key_here
Optional: use Docker Compose as described in the README to configure resources, cache, and environment variables more conveniently.

Note: For production deployments with JavaScript rendering, allocate sufficient memory (recommended 1G+ for Playwright contexts).

Additional notes

Tips and common considerations:

Resources are disabled by default to reduce context usage; enable with ENABLE_RESOURCES=true or via the environment in Docker/compose.
Perplexity integration requires PERPLEXITY_API_KEY; without it, perplexity tools will be unavailable.
JS rendering via Playwright consumes more memory; adjust PLAYWRIGHT_MAX_CONTEXTS and container memory accordingly.
The MCP dashboard offers an interactive playground and runtime configuration without restarts; use it to tune settings on the fly.
For caching and performance, consider enabling a persistent volume for cache when using Docker Compose.
If behind a proxy, set HTTP_PROXY and HTTPS_PROXY accordingly; some environments may require SCRAPEOPS settings to be configured for proxy workflows.

Related MCP Servers

oxylabs

Official Oxylabs MCP integration

reddit-research

Turn Reddit's chaos into structured insights with full citations. MCP server for competitive analysis, customer discovery, and market research. Zero-setup hosted solution with semantic search across 20,000+ subreddits.

lc2mcp

Convert LangChain tools to FastMCP tools

octave

OCTAVE protocol - structured AI communication with 3-20x token reduction. MCP server with lenient-to-canonical pipeline and schema validation.

fal

MCP server for Fal.ai - Generate images, videos, music and audio with Claude

mini_claude

Give Claude Code persistent memory across sessions. Track habits, log mistakes, prevent death spirals. Runs locally with Ollama.