web-fetch
npx machina-cli add skill aiskillstore/marketplace/web-fetch --openclawWeb Content Fetching
Fetch web content using curl | html2markdown with CSS selectors for clean, complete markdown output.
Quick Usage (Known Sites)
Use site-specific selectors for best results:
# Anthropic docs
curl -s "<url>" | html2markdown --include-selector "#content-container"
# MDN Web Docs
curl -s "<url>" | html2markdown --include-selector "article"
# GitHub docs
curl -s "<url>" | html2markdown --include-selector "article" --exclude-selector "nav,.sidebar"
# Generic article pages
curl -s "<url>" | html2markdown --include-selector "article,main,[role=main]" --exclude-selector "nav,header,footer"
Site Patterns
| Site | Include Selector | Exclude Selector |
|---|---|---|
| platform.claude.com | #content-container | - |
| docs.anthropic.com | #content-container | - |
| developer.mozilla.org | article | - |
| github.com (docs) | article | nav,.sidebar |
| Generic | article,main | nav,header,footer,script,style |
Universal Fallback (Unknown Sites)
For sites without known patterns, use the Bun script which auto-detects content:
bun ~/.claude/skills/web-fetch/fetch.ts "<url>"
Setup (one-time)
cd ~/.claude/skills/web-fetch && bun install
Finding the Right Selector
When a site isn't in the patterns list:
# Check what content containers exist
curl -s "<url>" | grep -o '<article[^>]*>\|<main[^>]*>\|id="[^"]*content[^"]*"' | head -10
# Test a selector
curl -s "<url>" | html2markdown --include-selector "<selector>" | head -30
# Check line count
curl -s "<url>" | html2markdown --include-selector "<selector>" | wc -l
Options Reference
--include-selector "CSS" # Only include matching elements
--exclude-selector "CSS" # Remove matching elements
--domain "https://..." # Convert relative links to absolute
Comparison
| Method | Anthropic Docs | Code Blocks | Complexity |
|---|---|---|---|
| Full page | 602 lines | Yes | Noisy |
--include-selector "#content-container" | 385 lines | Yes | Clean |
| Bun script (universal) | 383 lines | Yes | Clean |
Troubleshooting
Wrong content selected: The site may have multiple articles. Inspect the HTML:
curl -s "<url>" | grep -o '<article[^>]*>'
Empty output: The selector doesn't match. Try broader selectors like main or body.
Missing code blocks: Check if the site uses non-standard code formatting.
Client-rendered content: If HTML only has "Loading..." placeholders, the content is JS-rendered. Neither curl nor the Bun script can extract it; use browser-based tools.
Source
git clone https://github.com/aiskillstore/marketplace/blob/main/skills/0xbigboss/web-fetch/SKILL.mdView on GitHub Overview
Web-fetch retrieves web content via curl and converts HTML to clean markdown using html2markdown. It uses include and exclude selectors to extract the main content, making documentation, articles, and reference pages easy to repurpose.
How This Skill Works
The tool fetches a URL, pipes the HTML through html2markdown, and uses CSS selectors to include only desired elements while excluding navigation and sidebars. For unknown sites, a Bun script can auto-detect content; setup steps install dependencies.
When to Use It
- You need clean markdown from documentation pages (e.g., Anthropic docs) using a precise content container.
- You want MDN Web Docs, GitHub docs, or other reference pages converted to markdown with minimal noise.
- You are processing generic article pages using article or main selectors to capture the core content.
- You work with sites that have known patterns and prefer recommended include/exclude selectors for reliable extraction.
- You need a universal fallback automatically detects content on unknown sites via Bun script.
Quick Start
- Step 1: Pick a URL to fetch.
- Step 2: Run curl -s '<url>' | html2markdown --include-selector '<selector>' [--exclude-selector '<selector>']
- Step 3: Review output and refine the selector as needed.
Best Practices
- Identify a precise include selector that targets the main content (e.g., article, main, or #content-container).
- Use exclude-selector to remove navigation, sidebars, headers, and footers from the output.
- Test selectors locally with curl and html2markdown before saving or publishing.
- Enable domain option to convert relative links to absolute URLs for portable content.
- Be mindful of JS-rendered content; curl/html2markdown works best with server-rendered HTML.
Example Use Cases
- Anthropic docs: curl -s '<url>' | html2markdown --include-selector '#content-container'
- MDN Web Docs: curl -s '<url>' | html2markdown --include-selector 'article'
- GitHub docs: curl -s '<url>' | html2markdown --include-selector 'article' --exclude-selector 'nav,.sidebar'
- Generic articles: curl -s '<url>' | html2markdown --include-selector 'article,main,[role=main]' --exclude-selector 'nav,header,footer'
- Unknown sites: bun ~/.claude/skills/web-fetch/fetch.ts '<url>'