scrap
An MCP (Model Context Protocol) server that can scrape web pages and extract content using CSS selectors. Built with deno-dom for fast HTML parsing.
claude mcp add --transport stdio sigmasd-scrap-mcp npx -y @sigma/scrap-mcp
How to use
This MCP server provides a focused web scraping capability. It fetches publicly accessible web pages and extracts content using CSS selectors, returning only the text content of matched elements. The main tool is scrape_page, which takes a URL and a CSS selector to locate elements. Use it when you want to pull specific pieces of information from pages (for example, headings, paragraphs, or links) without loading the entire page content into context. The server handles common errors (network issues, HTTP errors, parsing problems, and invalid selectors) and returns readable error messages through the MCP protocol, making it suitable for integration into larger LLM workflows that need structured, targeted data extraction. Typical usage patterns involve querying with selectors like h1, p, a, or more complex selectors such as .article-content p or nav a, to collect exactly the content you need.
To use the tool, call scrape_page with the URL and a CSS selector. The response lists how many elements matched and provides the text content for each element, in order. This enables simple pipelines: fetch the page, apply selectors to pull the exact data you want, and feed the resulting text into your LLM or downstream processor.
How to install
Prerequisites:
- Node.js (recommended) with npm (or npx available)
- Internet access for fetching pages
Installation steps:
-
Ensure Node.js and npm are installed. Verify:
- node -v
- npm -v
-
Use npx to run the MCP server directly (no global install required):
npx -y @sigma/scrap-mcp
- If you prefer a long-running setup, you can install the package globally (optional):
npm install -g @sigma/scrap-mcp
Then start with:
npx @sigma/scrap-mcp
- In your MCP manager or orchestration, reference the mcp_config snippet to connect to this server under the name you chose (e.g., scrap).
Note: The server requires network access to fetch web pages (enabled by default when running with the appropriate permissions).
Additional notes
Tips and common issues:
- Permissions: Ensure outbound network access is allowed (e.g., with appropriate firewall rules or denials). The server relies on network access to fetch pages.
- Dynamic content: Some pages render content via JavaScript; CSS selectors may not reflect dynamically loaded text unless the page has already loaded in the response. If you get empty results, try alternative selectors or verify the page’s source HTML.
- Selector accuracy: Complex selectors can fail if the page structure changes. Start with simple selectors (e.g., h1, p) and progressively refine.
- Robots.txt and courtesy: Respect robots.txt and rate-limit requests to avoid overloading target sites.
- Error handling: If you encounter network issues, HTTP errors, or parsing failures, those errors are surfaced as readable MCP messages; check the selector validity and page URL correctness.
- Security: The server executes only CSS selectors and does not run arbitrary code from the page; outbound requests are sandboxed and limited to HTTP/HTTPS.
- Versioning: The readme references dependencies like @modelcontextprotocol/sdk and deno-dom in the upstream project; ensure you’re using compatible versions in your environment if you integrate or extend the server.
Related MCP Servers
context7
Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
obsidian -tools
Add Obsidian integrations like semantic search and custom Templater prompts to Claude or any MCP client.
MiniMax -JS
Official MiniMax Model Context Protocol (MCP) JavaScript implementation that provides seamless integration with MiniMax's powerful AI capabilities including image generation, video generation, text-to-speech, and voice cloning APIs.
mcp-bundler
Is the MCP configuration too complicated? You can easily share your own simplified setup!
akyn-sdk
Turn any data source into an MCP server in 5 minutes. Build AI-agents-ready knowledge bases.
promptboard
The Shared Whiteboard for Your AI Agents via MCP. Paste screenshots, mark them up, and share with AI.