Get the FREE Ultimate OpenClaw Setup Guide →

fetching-blocked-urls

npx machina-cli add skill oaustegard/claude-skills/fetching-blocked-urls --openclaw
Files (1)
SKILL.md
1.9 KB

Fetching Blocked URLs

Retrieve readable content from URLs that web_fetch cannot access. Jina AI's reader service renders JavaScript, bypasses soft blocks, and returns clean markdown.

Activation Triggers

Invoke this skill immediately when web_fetch returns:

  • 403 Forbidden or access denied
  • Paywall or login wall indicators
  • Empty, garbled, or truncated content
  • JavaScript-heavy SPA failures
  • Timeout errors

Core Command

curl -s --max-time 30 "https://r.jina.ai/TARGET_URL"

The service returns markdown with page title, body text, and preserved links.

Retry Pattern

Jina's backend has ~10% intermittent failures. Use retry logic to achieve 99%+ success:

for attempt in 1 2 3; do
  result=$(curl -s --max-time 30 "https://r.jina.ai/TARGET_URL" 2>&1)
  echo "$result" | grep -q "upstream connect error" || { echo "$result"; break; }
  [ $attempt -lt 3 ] && sleep 1
done

Workflow Integration

  1. Primary: Use web_fetch (native tool)
  2. Fallback: This skill with retry when web_fetch fails
  3. Escalate: Request user assistance only after retry exhaustion

Attempt this fallback before asking users to copy-paste content manually.

Output Format

Jina returns structured markdown:

  • Title: page title
  • URL Source: original URL
  • Markdown Content: extracted body text, links preserved

Limitations

  • Long pages may truncate
  • Sites blocking all scrapers remain inaccessible
  • Login-required content limited to public portions
  • Real-time dynamic content may not render

Domain Access

r.jina.ai is whitelisted in Claude container network configuration.

Source

git clone https://github.com/oaustegard/claude-skills/blob/main/fetching-blocked-urls/SKILL.mdView on GitHub

Overview

Fetch Blocked URLs retrieves readable content from URLs web_fetch cannot access. Jina AI's reader service renders JavaScript, bypasses soft blocks, and returns clean markdown that includes the page title, body text, and preserved links. It is designed as a fallback with automatic retry to improve reliability.

How This Skill Works

When web_fetch fails or content is blocked, the skill uses curl to request the Jina AI reader service at r.jina.ai/TARGET_URL, which renders pages and returns Markdown. It includes an automatic retry pattern to recover from intermittent upstream errors. The output includes Title, URL Source, and Markdown Content.

When to Use It

  • 403 Forbidden or access denied
  • Paywall or login wall indicators
  • Empty, garbled, or truncated content
  • JavaScript-heavy SPA failures
  • Timeout errors

Quick Start

  1. Step 1: Detect a blocked or failed URL (403, paywall, empty content, timeout, or JS render issue).
  2. Step 2: Fetch via Jina: curl -s --max-time 30 https://r.jina.ai/TARGET_URL
  3. Step 3: If needed, run the retry loop until success and review the Markdown output

Best Practices

  • Use as the primary fallback after web_fetch fails
  • Implement the provided 3-attempt retry loop to reach 99%+ success
  • Validate the returned content by checking Title, URL Source, and preserved links
  • Be aware of long pages that may truncate
  • Ensure your environment can access r.jina.ai (Domain Access)

Example Use Cases

  • Access a paywalled article by fetching via Jina to extract title and body
  • Render JavaScript-heavy GitHub Pages that curl cannot fetch directly
  • Recover from timeout by retrying and obtaining a stable Markdown output
  • Extract content from a single-page application with dynamic loading
  • Get public portions of a login-protected page via the fallback mechanism

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers