AI-Cursor-Scraping-Assistant
A powerful tool that leverages Cursor AI and MCP (Model Context Protocol) to easily generate web scrapers for various types of websites.
claude mcp add --transport stdio thewebscrapingclub-ai-cursor-scraping-assistant python xpath_server.py \ --env CAMOUFOX_FILE_PATH="path to Camoufox_template.py"
How to use
AI-Cursor-Scraping-Assistant is a Python-based MCP server that empowers Cursor AI to automatically generate web scrapers. It combines Cursor Rules and MCP tools to analyze websites, detect structure, and produce Scrapy or Camoufox scrapers with minimal user input. The server exposes an XPath selector generator and anti-bot analysis workflow, enabling Cursor to fetch page content, identify JSON data, and create scraper templates tailored to PLP (Product Listing Page) and PDP (Product Detail Page) patterns. You can also opt to use Camoufox for stealth scraping when anti-bot protections are present. To use it, ensure the MCP server is running and connect Cursor to the MCP endpoint; then prompt Cursor to generate an e-commerce scraper (e.g., Write an e-commerce PDP scraper for nike.com), and Cursor will guide you through analysis, selector extraction, and code generation.
How to install
Prerequisites:
- Python 3.10+
- Cursor AI installed
- Basic knowledge of web scraping concepts
- Clone the repository and install dependencies
git clone https://github.com/TheWebScrapingClub/AI-Cursor-Scraping-Assistant.git
cd AI-Cursor-Scraping-Assistant
# Install MCP tooling and required packages
pip install mcp camoufox scrapy
- Optional: set up Camoufox browser binary (if you plan to use Camoufox)
python -m camoufox fetch
- Start the MCP server
cd MCPfiles
python xpath_server.py
- In Cursor, configure the MCP server (usually via MCP panel) to point at the running server. You should see the server name AI-Cursor-Scraping-Assistant in the MCP registry.
Note: If you adjust Camoufox paths, ensure CAMOUFOX_FILE_PATH is updated in the MCP configuration.
Additional notes
Tips and considerations:
- The server relies on Camoufox for stealth scrapers; run python -m camoufox fetch if you plan to bypass certain anti-bot measures.
- Ensure CAMOUFOX_FILE_PATH points to a valid Camoufox_template.py before starting the MCP server.
- The MCP workflow includes multiple MDC rule sets (prerequisites, website-analysis, scrapy, scraper-models) that guide Cursor in analysis and scraper generation.
- When using Camoufox, you may need to fetch the browser binary and ensure network access to target sites is permitted.
- If you encounter anti-bot blocks, use the advanced rules to analyze cookies, JSON data, and schema.org markup as indicated in the Cursor rules documentation.
- This server exposes the path to xpath_server.py; ensure you run it from the MCPfiles directory to align with the intended configuration.
Related MCP Servers
mcp -aws
A Model Context Protocol server implementation for operations on AWS resources
code
Code-MCP: Connect Claude AI to your development environment through the Model Context Protocol (MCP), enabling terminal commands and file operations through the AI interface.
mcp-simple-timeserver
Simple MCP to give Claude ability to check current time as well as know when holidays are, what is the time distance between dates etc.
memory
A MCP (Model Context Protocol) server providing long-term memory for LLMs
system_information_mcp
DevEnvInfoServer - Cursor MCP Server for Development Environment Information
camoufox -python
MCP server for browser automation with Camoufox anti-detection capabilities