When should I use this skill?

Use this as a fallback when web_fetch fails due to 403, paywall, empty content, JavaScript rendering issues, or timeouts, with the provided retry logic.

What are the known limitations?

Long pages may truncate; sites blocking all scrapers remain inaccessible; login-required content is limited to public portions; real-time dynamic content may not render.

fetching-blocked-urls

npx machina-cli add skill oaustegard/claude-skills/fetching-blocked-urls --openclaw

Files (1)

SKILL.md

1.9 KB

Fetching Blocked URLs

Retrieve readable content from URLs that web_fetch cannot access. Jina AI's reader service renders JavaScript, bypasses soft blocks, and returns clean markdown.

Activation Triggers

Invoke this skill immediately when web_fetch returns:

403 Forbidden or access denied
Paywall or login wall indicators
Empty, garbled, or truncated content
JavaScript-heavy SPA failures
Timeout errors

Core Command

curl -s --max-time 30 "https://r.jina.ai/TARGET_URL"

The service returns markdown with page title, body text, and preserved links.

Retry Pattern

Jina's backend has ~10% intermittent failures. Use retry logic to achieve 99%+ success:

for attempt in 1 2 3; do
  result=$(curl -s --max-time 30 "https://r.jina.ai/TARGET_URL" 2>&1)
  echo "$result" | grep -q "upstream connect error" || { echo "$result"; break; }
  [ $attempt -lt 3 ] && sleep 1
done

Workflow Integration

Primary: Use web_fetch (native tool)
Fallback: This skill with retry when web_fetch fails
Escalate: Request user assistance only after retry exhaustion

Attempt this fallback before asking users to copy-paste content manually.

Output Format

Jina returns structured markdown:

Title: page title
URL Source: original URL
Markdown Content: extracted body text, links preserved

Limitations

Long pages may truncate
Sites blocking all scrapers remain inaccessible
Login-required content limited to public portions
Real-time dynamic content may not render

Domain Access

r.jina.ai is whitelisted in Claude container network configuration.

Source

git clone https://github.com/oaustegard/claude-skills/blob/main/fetching-blocked-urls/SKILL.mdView on GitHub

Overview

Fetch Blocked URLs retrieves readable content from URLs web_fetch cannot access. Jina AI's reader service renders JavaScript, bypasses soft blocks, and returns clean markdown that includes the page title, body text, and preserved links. It is designed as a fallback with automatic retry to improve reliability.

How This Skill Works

When web_fetch fails or content is blocked, the skill uses curl to request the Jina AI reader service at r.jina.ai/TARGET_URL, which renders pages and returns Markdown. It includes an automatic retry pattern to recover from intermittent upstream errors. The output includes Title, URL Source, and Markdown Content.

When to Use It

403 Forbidden or access denied
Paywall or login wall indicators
Empty, garbled, or truncated content
JavaScript-heavy SPA failures
Timeout errors

Quick Start

Step 1: Detect a blocked or failed URL (403, paywall, empty content, timeout, or JS render issue).
Step 2: Fetch via Jina: curl -s --max-time 30 https://r.jina.ai/TARGET_URL
Step 3: If needed, run the retry loop until success and review the Markdown output

Best Practices

Use as the primary fallback after web_fetch fails
Implement the provided 3-attempt retry loop to reach 99%+ success
Validate the returned content by checking Title, URL Source, and preserved links
Be aware of long pages that may truncate
Ensure your environment can access r.jina.ai (Domain Access)

Example Use Cases

Access a paywalled article by fetching via Jina to extract title and body
Render JavaScript-heavy GitHub Pages that curl cannot fetch directly
Recover from timeout by retrying and obtaining a stable Markdown output
Extract content from a single-page application with dynamic loading
Get public portions of a login-protected page via the fallback mechanism

Frequently Asked Questions

Add this skill to your agents