Get the FREE Ultimate OpenClaw Setup Guide →

web-fetch

npx machina-cli add skill aiskillstore/marketplace/web-fetch --openclaw
Files (1)
SKILL.md
2.8 KB

Web Content Fetching

Fetch web content using curl | html2markdown with CSS selectors for clean, complete markdown output.

Quick Usage (Known Sites)

Use site-specific selectors for best results:

# Anthropic docs
curl -s "<url>" | html2markdown --include-selector "#content-container"

# MDN Web Docs
curl -s "<url>" | html2markdown --include-selector "article"

# GitHub docs
curl -s "<url>" | html2markdown --include-selector "article" --exclude-selector "nav,.sidebar"

# Generic article pages
curl -s "<url>" | html2markdown --include-selector "article,main,[role=main]" --exclude-selector "nav,header,footer"

Site Patterns

SiteInclude SelectorExclude Selector
platform.claude.com#content-container-
docs.anthropic.com#content-container-
developer.mozilla.orgarticle-
github.com (docs)articlenav,.sidebar
Genericarticle,mainnav,header,footer,script,style

Universal Fallback (Unknown Sites)

For sites without known patterns, use the Bun script which auto-detects content:

bun ~/.claude/skills/web-fetch/fetch.ts "<url>"

Setup (one-time)

cd ~/.claude/skills/web-fetch && bun install

Finding the Right Selector

When a site isn't in the patterns list:

# Check what content containers exist
curl -s "<url>" | grep -o '<article[^>]*>\|<main[^>]*>\|id="[^"]*content[^"]*"' | head -10

# Test a selector
curl -s "<url>" | html2markdown --include-selector "<selector>" | head -30

# Check line count
curl -s "<url>" | html2markdown --include-selector "<selector>" | wc -l

Options Reference

--include-selector "CSS"  # Only include matching elements
--exclude-selector "CSS"  # Remove matching elements
--domain "https://..."    # Convert relative links to absolute

Comparison

MethodAnthropic DocsCode BlocksComplexity
Full page602 linesYesNoisy
--include-selector "#content-container"385 linesYesClean
Bun script (universal)383 linesYesClean

Troubleshooting

Wrong content selected: The site may have multiple articles. Inspect the HTML:

curl -s "<url>" | grep -o '<article[^>]*>'

Empty output: The selector doesn't match. Try broader selectors like main or body.

Missing code blocks: Check if the site uses non-standard code formatting.

Client-rendered content: If HTML only has "Loading..." placeholders, the content is JS-rendered. Neither curl nor the Bun script can extract it; use browser-based tools.

Source

git clone https://github.com/aiskillstore/marketplace/blob/main/skills/0xbigboss/web-fetch/SKILL.mdView on GitHub

Overview

Web-fetch retrieves web content via curl and converts HTML to clean markdown using html2markdown. It uses include and exclude selectors to extract the main content, making documentation, articles, and reference pages easy to repurpose.

How This Skill Works

The tool fetches a URL, pipes the HTML through html2markdown, and uses CSS selectors to include only desired elements while excluding navigation and sidebars. For unknown sites, a Bun script can auto-detect content; setup steps install dependencies.

When to Use It

  • You need clean markdown from documentation pages (e.g., Anthropic docs) using a precise content container.
  • You want MDN Web Docs, GitHub docs, or other reference pages converted to markdown with minimal noise.
  • You are processing generic article pages using article or main selectors to capture the core content.
  • You work with sites that have known patterns and prefer recommended include/exclude selectors for reliable extraction.
  • You need a universal fallback automatically detects content on unknown sites via Bun script.

Quick Start

  1. Step 1: Pick a URL to fetch.
  2. Step 2: Run curl -s '<url>' | html2markdown --include-selector '<selector>' [--exclude-selector '<selector>']
  3. Step 3: Review output and refine the selector as needed.

Best Practices

  • Identify a precise include selector that targets the main content (e.g., article, main, or #content-container).
  • Use exclude-selector to remove navigation, sidebars, headers, and footers from the output.
  • Test selectors locally with curl and html2markdown before saving or publishing.
  • Enable domain option to convert relative links to absolute URLs for portable content.
  • Be mindful of JS-rendered content; curl/html2markdown works best with server-rendered HTML.

Example Use Cases

  • Anthropic docs: curl -s '<url>' | html2markdown --include-selector '#content-container'
  • MDN Web Docs: curl -s '<url>' | html2markdown --include-selector 'article'
  • GitHub docs: curl -s '<url>' | html2markdown --include-selector 'article' --exclude-selector 'nav,.sidebar'
  • Generic articles: curl -s '<url>' | html2markdown --include-selector 'article,main,[role=main]' --exclude-selector 'nav,header,footer'
  • Unknown sites: bun ~/.claude/skills/web-fetch/fetch.ts '<url>'

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers