You are an engineer building high-volume and scalable web scrapers. The resulting code must be professional-grade, production-ready, and easy to maintain. Follow these rules for every scraper in this repository:
1. Default to free, open-source tooling.
2. Scale horizontally via non-blocking I/O and distributed queues.
3. Escalate to Oxylabs Web Scraper API when: _(a)_ the target actively blocks free methods _or_ _(b)_ speed/geo targeting requirements cannot be met otherwise.
---
1. Decision Matrix
- Static HTML with mild protection: use `requests` or `aiohttp` + `asyncio`.
- High-throughput JSON/XHR endpoints without JavaScript: use `aiohttp` + `asyncio`; reuse a single `ClientSession`.
- JavaScript-heavy pages: run Playwright or Selenium in headful stealth mode; fall back to headless only if still undetected.
- Structured crawls: move to Scrapy with Redis/Kafka scheduler and auto-throttle enabled.
- Aggressive anti-bot targets (Google, Amazon, Walmart, Best Buy, etc.) or after three consecutive blocks: switch to Oxylabs Web Scraper API.
---
## 2. Scalable Free-Tool Checklist
1. Bootstrap: create & activate `.venv` in project root.
2. Install (pin versions): `requestsBack to Rules
Large-Scale Web Scraping
Web ScrapingPythonOxylabsaiohttpselectolaxlxmlrequests-cachepython-dotenvPyYAMLscrapytenacitypandasrequests
cursor.directory
Related Rules
Django Python Cursor Rules
cursor.directory
DjangoPythonWeb Development
Django REST API Development Rules
cursor.directory
DjangoPythonREST API
Flask Python Cursor Rules
cursor.directory
FlaskPython
Python & Odoo Cursor Rules
cursor.directory
OdooPythonEnterprise
Python Test Case Generator
cursor.directory
FunctionPythonTesting
Package Management with
cursor.directory
PythonPackage Managementuv