What happens if the site uses JavaScript to render content?

curl fetches static HTML; dynamic content loaded via JS may not be captured; in that case use a browser-based tool or alternatives.

How do I handle unknown sites?

Use the universal Bun script at bun ~/.claude/skills/web-fetch/fetch.ts ' ' and adjust selectors as needed.

What format is the output?

Markdown suitable for documentation, articles, and reference pages.

web-fetch

npx machina-cli add skill aiskillstore/marketplace/web-fetch --openclaw

Files (1)

SKILL.md

2.8 KB

Web Content Fetching

Fetch web content using curl | html2markdown with CSS selectors for clean, complete markdown output.

Quick Usage (Known Sites)

Use site-specific selectors for best results:

# Anthropic docs
curl -s "<url>" | html2markdown --include-selector "#content-container"

# MDN Web Docs
curl -s "<url>" | html2markdown --include-selector "article"

# GitHub docs
curl -s "<url>" | html2markdown --include-selector "article" --exclude-selector "nav,.sidebar"

# Generic article pages
curl -s "<url>" | html2markdown --include-selector "article,main,[role=main]" --exclude-selector "nav,header,footer"

Site Patterns

Site	Include Selector	Exclude Selector
platform.claude.com	`#content-container`	-
docs.anthropic.com	`#content-container`	-
developer.mozilla.org	`article`	-
github.com (docs)	`article`	`nav,.sidebar`
Generic	`article,main`	`nav,header,footer,script,style`

Universal Fallback (Unknown Sites)

For sites without known patterns, use the Bun script which auto-detects content:

bun ~/.claude/skills/web-fetch/fetch.ts "<url>"

Setup (one-time)

cd ~/.claude/skills/web-fetch && bun install

Finding the Right Selector

When a site isn't in the patterns list:

# Check what content containers exist
curl -s "<url>" | grep -o '<article[^>]*>\|<main[^>]*>\|id="[^"]*content[^"]*"' | head -10

# Test a selector
curl -s "<url>" | html2markdown --include-selector "<selector>" | head -30

# Check line count
curl -s "<url>" | html2markdown --include-selector "<selector>" | wc -l

Options Reference

--include-selector "CSS"  # Only include matching elements
--exclude-selector "CSS"  # Remove matching elements
--domain "https://..."    # Convert relative links to absolute

Comparison

Method	Anthropic Docs	Code Blocks	Complexity
Full page	602 lines	Yes	Noisy
`--include-selector "#content-container"`	385 lines	Yes	Clean
Bun script (universal)	383 lines	Yes	Clean

Troubleshooting

Wrong content selected: The site may have multiple articles. Inspect the HTML:

curl -s "<url>" | grep -o '<article[^>]*>'

Empty output: The selector doesn't match. Try broader selectors like main or body.

Missing code blocks: Check if the site uses non-standard code formatting.

Client-rendered content: If HTML only has "Loading..." placeholders, the content is JS-rendered. Neither curl nor the Bun script can extract it; use browser-based tools.

Source

git clone https://github.com/aiskillstore/marketplace/blob/main/skills/0xbigboss/web-fetch/SKILL.mdView on GitHub

Overview

Web-fetch retrieves web content via curl and converts HTML to clean markdown using html2markdown. It uses include and exclude selectors to extract the main content, making documentation, articles, and reference pages easy to repurpose.

How This Skill Works

The tool fetches a URL, pipes the HTML through html2markdown, and uses CSS selectors to include only desired elements while excluding navigation and sidebars. For unknown sites, a Bun script can auto-detect content; setup steps install dependencies.

When to Use It

You need clean markdown from documentation pages (e.g., Anthropic docs) using a precise content container.
You want MDN Web Docs, GitHub docs, or other reference pages converted to markdown with minimal noise.
You are processing generic article pages using article or main selectors to capture the core content.
You work with sites that have known patterns and prefer recommended include/exclude selectors for reliable extraction.
You need a universal fallback automatically detects content on unknown sites via Bun script.

Quick Start

Step 1: Pick a URL to fetch.
Step 2: Run curl -s '<url>' | html2markdown --include-selector '<selector>' [--exclude-selector '<selector>']
Step 3: Review output and refine the selector as needed.

Best Practices

Identify a precise include selector that targets the main content (e.g., article, main, or #content-container).
Use exclude-selector to remove navigation, sidebars, headers, and footers from the output.
Test selectors locally with curl and html2markdown before saving or publishing.
Enable domain option to convert relative links to absolute URLs for portable content.
Be mindful of JS-rendered content; curl/html2markdown works best with server-rendered HTML.

Example Use Cases

Anthropic docs: curl -s '<url>' | html2markdown --include-selector '#content-container'
MDN Web Docs: curl -s '<url>' | html2markdown --include-selector 'article'
GitHub docs: curl -s '<url>' | html2markdown --include-selector 'article' --exclude-selector 'nav,.sidebar'
Generic articles: curl -s '<url>' | html2markdown --include-selector 'article,main,[role=main]' --exclude-selector 'nav,header,footer'
Unknown sites: bun ~/.claude/skills/web-fetch/fetch.ts '<url>'

Frequently Asked Questions

Add this skill to your agents