Firecrawler
Scanned@capt-marbles
npx machina-cli add skill @capt-marbles/firecrawler --openclawFirecrawl Web Skill
Scrape, search, and crawl the web using Firecrawl.
Setup
- Get your API key from firecrawl.dev/app/api-keys
- Set the environment variable:
export FIRECRAWL_API_KEY=fc-your-key-here - Install the SDK:
pip3 install firecrawl
Usage
All commands use the bundled fc.py script in this skill's directory.
Get Page as Markdown
Fetch any URL and convert to clean markdown. Handles JavaScript-rendered content.
python3 fc.py markdown "https://example.com"
python3 fc.py markdown "https://example.com" --main-only # skip nav/footer
Take Screenshot
Capture a full-page screenshot of any URL.
python3 fc.py screenshot "https://example.com" -o screenshot.png
Extract Structured Data
Pull specific fields from a page using a JSON schema.
Schema example (schema.json):
{
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "number" },
"features": { "type": "array", "items": { "type": "string" } }
}
}
python3 fc.py extract "https://example.com/product" --schema schema.json
python3 fc.py extract "https://example.com/product" --schema schema.json --prompt "Extract the main product details"
Web Search
Search the web and get content from results (may require paid tier).
python3 fc.py search "Python 3.13 new features" --limit 5
Crawl Documentation
Crawl an entire documentation site. Great for learning new frameworks.
python3 fc.py crawl "https://docs.example.com" --limit 30
python3 fc.py crawl "https://docs.example.com" --limit 50 --output ./docs
Note: Each page costs 1 credit. Set reasonable limits.
Map Site URLs
Discover all URLs on a website before deciding what to scrape.
python3 fc.py map "https://example.com" --limit 100
python3 fc.py map "https://example.com" --search "api"
Example Prompts
- "Scrape https://blog.example.com/post and summarize it"
- "Take a screenshot of stripe.com"
- "Extract the name, price, and features from this product page"
- "Crawl the Astro docs so you can help me build a site"
- "Map all the URLs on docs.stripe.com"
Pricing
Free tier includes 500 credits. 1 credit = 1 page/screenshot/search query.
Overview
Firecrawl is an API-driven web scraping and crawling tool that fetches pages as Markdown, captures full-page screenshots, and extracts structured data. It can also search the web and crawl documentation sites, making it ideal for gathering up-to-date content and building data pipelines. With its CLI-based fc.py, you can automate content collection, visuals, and schema-based data extraction.
How This Skill Works
The skill uses the bundled fc.py script to call Firecrawl API endpoints for markdown, screenshot, extract, search, crawl, and map actions. You must set the FIRECRAWL_API_KEY environment variable and install the firecrawl SDK (pip3 install firecrawl). Commands like python3 fc.py markdown <URL> fetch Markdown content (handling JavaScript-rendered pages), while others perform screenshots or data extraction per a provided JSON schema.
When to Use It
- Need to scrape a URL to retrieve current content or metadata
- Want a full-page screenshot for design reviews or documentation
- Need to extract specific fields (title, price, features) using a JSON schema
- Want to search the web and pull results for references or competitive intel
- Crawl documentation sites to learn a framework or library
Quick Start
- Step 1: export FIRECRAWL_API_KEY=fc-your-key-here
- Step 2: pip3 install firecrawl
- Step 3: Run a sample, e.g., python3 fc.py markdown "https://example.com"
Best Practices
- Use the markdown option for JS-rendered pages to ensure complete content
- Define a strict JSON schema (schema.json) for extract to avoid ambiguity
- Set and respect credits/limits; note that each page costs 1 credit
- Use --main-only to skip navigation and footer for cleaner data
- Keep your API key secure; rotate it if you suspect compromise
Example Use Cases
- Fetch a product page and extract title, price, and features using a schema
- Capture a full-page screenshot of a homepage for a design review
- Crawl the documentation of a framework to map topics and key sections
- Map all URLs on a docs site to plan subsequent extractions
- Convert a blog post to Markdown for archival and indexing