mcp -fetch
MCP Server to fetch information from the internet based on URL
claude mcp add --transport stdio maartensmeets-mcp-server-fetch docker run --rm -i mcp-server-fetch
How to use
The mcp-server-fetch provides a browser-backed web content fetch capability for LLMs. It uses browser automation (undetected-chromedriver), OCR via pytesseract, and multiple extraction methods (HTML parsing with BeautifulSoup, document parsing for PDFs/DOCX/PPTX, and a markdown conversion path) to return structured content from web pages. A scoring system evaluates results to favor longer, well-structured content while penalizing noisy or error-filled results. Tools are exposed via the fetch endpoint, which accepts a URL and returns the extracted content as markdown by default, with an option to retrieve the raw HTML when requested. This enables LLMs to retrieve content from pages that render content with JavaScript or employ anti-scraping techniques.
To use the tool, run the server and call the fetch tool with a URL. The server will navigate the page using browser automation, optionally capture OCR-based content for images or non-text elements, and provide the best content extraction across multiple methods. If you need the actual HTML, you can request raw output by setting the raw flag. Debug logging is available to inspect how the scoring and selection process chooses the best result.
How to install
Prerequisites:
- Docker must be installed and running on your machine.
- Access to the internet to pull the mcp-server-fetch image during the first run (or build locally if you prefer).
Install and run:
- Build the Docker image locally (optional if you pull from a registry):
docker build -t mcp-server-fetch .
- Run the Docker container:
docker run --rm -i mcp-server-fetch
- If you are configuring via an MCP orchestration, ensure your mcp_config points to the docker run command as shown in the example configuration.
Notes:
- The server supports cookie consent handling, full-page screenshots, and OCR-based extraction to improve content coverage on dynamic pages.
- You can customize the user-agent string by adding --user-agent=YourUserAgent to the args in your MCP configuration.
Additional notes
Tips and caveats:
- If a page requires interaction, the browser automation may simulate the interaction to reveal content before extraction.
- The fetch tool returns content in markdown by default; use the raw flag to obtain the unprocessed HTML when needed.
- The scoring system prioritizes content length, structure, and readability. Very short or error-laden content may be penalized.
- If you need to override the user-agent for compatibility or privacy, add the --user-agent flag to the docker run arguments in your configuration.
- Ensure network access from the host to allow the container to fetch remote pages, and be mindful of rate limits or site policies.
Related MCP Servers
mcp-vegalite
MCP server from isaacwasserman/mcp-vegalite-server
github-chat
A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
pagerduty
PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.
futu-stock
mcp server for futuniuniu stock
mcp -boilerplate
Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP