Get the FREE Ultimate OpenClaw Setup Guide →

scraper

Context-optimized MCP server for web scraping. Reduces LLM token usage by 70-90% through server-side CSS filtering and HTML-to-markdown conversion.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio cotdp-scraper-mcp docker run -d -p 8000:8000 --name scraper-mcp ghcr.io/cotdp/scraper-mcp:latest \
  --env HTTP_PROXY="" \
  --env HTTPS_PROXY="" \
  --env ENABLE_PROMPTS="true" \
  --env ENABLE_RESOURCES="true" \
  --env SCRAPEOPS_API_KEY="" \
  --env PERPLEXITY_API_KEY="your_key_here" \
  --env PLAYWRIGHT_TIMEOUT="30000" \
  --env SCRAPEOPS_RENDER_JS="true" \
  --env PLAYWRIGHT_DISABLE_GPU="true" \
  --env PLAYWRIGHT_MAX_CONTEXTS="5"

How to use

This MCP server, Scraper, provides context-optimized web scraping tooling that reduces token usage for language models by performing server-side HTML filtering, Markdown conversion, and CSS selector targeting. It exposes a suite of tools to fetch and process web content, including raw HTML, markdown, plain text, and link extraction, with optional JavaScript rendering via Playwright for dynamic pages. You can also perform AI-assisted web search and reasoning through the Perplexity tools integrated into the server, and you can monitor statistics and test APIs via the built-in dashboard. To use it, run the server (preferably via Docker as outlined in the deployment instructions), then interact with the MCP endpoint at /mcp and use the provided tools such as scrape_url, scrape_url_html, scrape_url_text, scrape_extract_links, perplexity, and perplexity_reason. The tools support single URLs or batch inputs, optional CSS selectors, and optional render_js to render dynamic content before extraction.

How to install

Prerequisites:

  • Docker installed on your host (recommended for this MCP server).
  • Access to the internet to pull the Docker image from GHCR.

Installation steps:

  1. Install Docker: follow the official installation guide for your OS (https://docs.docker.com/get-docker/).
  2. Pull the latest image (optional if you deploy via docker-compose but useful for quick start): docker pull ghcr.io/cotdp/scraper-mcp:latest
  3. Run the MCP server with Docker (example): docker run -d -p 8000:8000 --name scraper-mcp ghcr.io/cotdp/scraper-mcp:latest
  4. Verify the MCP endpoint:
  5. If you need Perplexity integration, export the API key: export PERPLEXITY_API_KEY=your_key_here
  6. Optional: use Docker Compose as described in the README to configure resources, cache, and environment variables more conveniently.

Note: For production deployments with JavaScript rendering, allocate sufficient memory (recommended 1G+ for Playwright contexts).

Additional notes

Tips and common considerations:

  • Resources are disabled by default to reduce context usage; enable with ENABLE_RESOURCES=true or via the environment in Docker/compose.
  • Perplexity integration requires PERPLEXITY_API_KEY; without it, perplexity tools will be unavailable.
  • JS rendering via Playwright consumes more memory; adjust PLAYWRIGHT_MAX_CONTEXTS and container memory accordingly.
  • The MCP dashboard offers an interactive playground and runtime configuration without restarts; use it to tune settings on the fly.
  • For caching and performance, consider enabling a persistent volume for cache when using Docker Compose.
  • If behind a proxy, set HTTP_PROXY and HTTPS_PROXY accordingly; some environments may require SCRAPEOPS settings to be configured for proxy workflows.

Related MCP Servers

Sponsor this space

Reach thousands of developers