firecrawl-scrape
npx machina-cli add skill parcadei/Continuous-Claude-v3/firecrawl-scrape --openclawFiles (1)
SKILL.md
1014 B
Firecrawl Scrape Skill
When to Use
- Scrape content from any URL
- Extract structured data from web pages
- Search the web and get content
Instructions
uv run python -m runtime.harness scripts/mcp/firecrawl_scrape.py \
--url "https://example.com" \
--format "markdown"
Parameters
--url: URL to scrape--format: Output format -markdown,html,text(default: markdown)--search: (alternative) Search query instead of direct URL
Examples
# Scrape a page
uv run python -m runtime.harness scripts/mcp/firecrawl_scrape.py \
--url "https://docs.python.org/3/library/asyncio.html"
# Search and scrape
uv run python -m runtime.harness scripts/mcp/firecrawl_scrape.py \
--search "Python asyncio best practices 2024"
MCP Server Required
Requires firecrawl server in mcp_config.json with FIRECRAWL_API_KEY.
Source
git clone https://github.com/parcadei/Continuous-Claude-v3/blob/main/.claude/skills/firecrawl-scrape/SKILL.mdView on GitHub Overview
Firecrawl-scrape pulls content from any URL and extracts structured data from web pages using the Firecrawl MCP server. It can output results in markdown, HTML, or plain text, enabling easy inclusion in reports and notebooks. This skill is ideal for automated data gathering, research, and content extraction workflows.
How This Skill Works
Run the harness with uv run python -m runtime.harness scripts/mcp/firecrawl_scrape.py, supplying --url and --format (markdown by default). The request is sent to a configured Firecrawl MCP server (configured in mcp_config.json with FIRECRAWL_API_KEY) which performs the scrape and returns content in the chosen format.
When to Use It
- You need to scrape content from any URL for quick research or data gathering.
- You must extract structured data (headings, metadata, tables) from web pages.
- You want to perform a web search and scrape results for comparison.
- You require output in markdown, HTML, or plain text to fit downstream tooling.
- You’re automating scraping in a pipeline that uses a configured MCP Firecrawl server.
Quick Start
- Step 1: Ensure MCP server is configured with FIRECRAWL_API_KEY in mcp_config.json.
- Step 2: Run a scrape with a URL or a search, e.g. uv run python -m runtime.harness scripts/mcp/firecrawl_scrape.py --url 'https://example.com' --format markdown.
- Step 3: Use the output (markdown/html/text) in your report, CMS, or downstream tooling.
Best Practices
- Ensure the Firecrawl MCP server is configured with FIRECRAWL_API_KEY in mcp_config.json before running.
- Prefer direct URLs for deterministic scraping; use --search for queries when URL is unknown.
- Specify the desired output format with --format (markdown by default) to match your downstream needs.
- Validate the scraped content for completeness and handle pagination or multi-page paths as needed.
- Limit requests and respect robots rules and terms of service; throttle where necessary.
Example Use Cases
- uv run python -m runtime.harness scripts/mcp/firecrawl_scrape.py --url 'https://docs.python.org/3/library/asyncio.html'
- uv run python -m runtime.harness scripts/mcp/firecrawl_scrape.py --search 'Python asyncio best practices 2024'
- Scrape a news article URL and export as markdown for a summary
- Automate gathering data from several product pages and compile into a report
- Publish scraped HTML content to a CMS by exporting HTML output
Frequently Asked Questions
Add this skill to your agents