Get the FREE Ultimate OpenClaw Setup Guide →

crawl4ai

🕷️ A lightweight Model Context Protocol (MCP) server that exposes Crawl4AI web scraping and crawling capabilities as tools for AI agents. Similar to Firecrawl's API but self-hosted and free. Perfect for integrating web scraping into your AI workflows with OpenAI Agents SDK, Cursor, Claude Code, and other MCP-compatible tools.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio sadiuysal-crawl4ai-mcp-server docker run -i uysalsadi/crawl4ai-mcp-server:latest \
  --env OPENAI_API_KEY="your-openai-api-key"

How to use

Crawl4AI MCP Server exposes four production-ready tools—scrape, crawl, crawl_site, and crawl_sitemap—through a standard MCP interface. You can run the server locally (via Docker or manual Python setup) and connect AI agents or MCP clients to invoke these tools in stdio mode or via MCP-compatible clients. The server leverages Crawl4AI for web scraping and crawling, offering features like single-page markdown extraction, breadth-first crawling with depth control, adaptive stopping, and safety measures that block internal networks and private IPs. Use the tools to fetch content, extract markdown, follow links across pages, and persist results when needed. The provided examples show how to initialize the MCP session and call specific tools such as scrape and crawl.

To use the tools, connect to the MCP server using the standard stdio client interface or through an MCP client library in a language of your choice. Once connected, initialize the session and invoke: 1) scrape with a URL to fetch and transform a page into markdown, 2) crawl with a seed URL to explore multiple pages with optional max_depth, max_pages, and adaptive settings, 3) crawl_site to perform site-wide crawling with persistence, or 4) crawl_sitemap to process a sitemap with persistence. You can override crawl and scrape behavior by supplying optional crawler, browser, script, and timeout configurations. The agent ecosystems (OpenAI Agents SDK, Cursor, Claude Code) can drive these calls by embedding the MCP client in workflows or agents.

How to install

Prerequisites:

  • Docker or Python 3.9+ with pip for manual installation
  • Access to the internet for pulling images or installing dependencies
  • Optional: OpenAI API key for agent integrations

Option 1 — Docker (Recommended):

  1. Install Docker on your machine.
  2. Run the MCP server using the pre-built image: docker run -i --rm -p 8000:8000 uysalsadi/crawl4ai-mcp-server:latest
  3. The server will expose the MCP interface over stdio/Docker network as configured by the image. If you need to test quickly, you can use the provided test utilities inside the container or via the client examples in the repository.

Option 2 — Manual installation (Python):

  1. Clone the repository: git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git cd crawl4ai-mcp-server
  2. Set up a virtual environment and install dependencies: python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt
  3. Install Playwright browsers (for scraping/crawling capabilities): python -m playwright install chromium
  4. Run the MCP server (stdio or HTTP as configured by the repository): python -m crawler_agent.mcp_server # example entry point; adapt as needed
  5. Test a basic call to ensure the server responds, using the provided smoke tests or a small MCP client script.

Environment tips:

  • If using OpenAI agents, set OPENAI_API_KEY in your environment or in your orchestrator.
  • For docker-based runs, you may need additional port mappings or volume mounts depending on how you persist outputs.

Additional notes

Tips and caveats:

  • The server includes safety filtering to block internal networks, localhost, and private IP ranges. When testing, ensure your seed URLs are accessible and permitted by your environment.
  • The four tools (scrape, crawl, crawl_site, crawl_sitemap) support optional overrides for crawler, browser, script, and timeout settings to tailor behavior for particular sites or extraction needs.
  • When using crawl or site-wide crawling with output_dir persistence, a manifest.json and content bundles are created. Ensure the output_dir path is writable when using those features.
  • If integrating with OpenAI Agents SDK, you can run the agent in a separate process and connect via MCP stdio, enabling automated tool calls from agent goals.
  • For production deployments, consider using the docker image with a proper orchestration (Kubernetes or Docker Compose) and secret management for API keys.

Related MCP Servers

Sponsor this space

Reach thousands of developers ↗