url-content-loading
npx machina-cli add skill narumiruna/kabigon/url-content-loading --openclawFiles (1)
SKILL.md
2.0 KB
How to load URL content
# Explicit loader selection
uvx kabigon --loader playwright https://example.com
uvx kabigon --loader httpx https://example.com
uvx kabigon --loader firecrawl https://example.com
uvx kabigon --loader youtube https://www.youtube.com/watch?v=dQw4w9WgXcQ
uvx kabigon --loader youtube-ytdlp https://www.youtube.com/watch?v=dQw4w9WgXcQ
uvx kabigon --loader ytdlp https://www.youtube.com/watch?v=dQw4w9WgXcQ
uvx kabigon --loader twitter https://x.com/howie_serious/status/1917768568135115147
uvx kabigon --loader truthsocial https://truthsocial.com/@realDonaldTrump/posts/115830428767897167
uvx kabigon --loader reddit https://reddit.com/r/confession/comments/1q1mzej/im_a_developer_for_a_major_food_delivery_app_the/
uvx kabigon --loader ptt https://www.ptt.cc/bbs/Gossiping/M.1746078381.A.FFC.html
uvx kabigon --loader reel https://www.instagram.com/reel/CuA0XYZ1234/
uvx kabigon --loader github https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md
uvx kabigon --loader pdf https://example.com/document.pdf
# Compose loaders in order
# Example: try YouTube first, then fall back to `youtube-ytdlp` if captions are missing.
# `youtube-ytdlp` can download audio and transcribe it via Whisper.
uvx kabigon --loader youtube,youtube-ytdlp https://www.youtube.com/watch?v=dQw4w9WgXcQ
# If you are not sure which loader to use, rely on the default pipeline.
uvx kabigon https://www.youtube.com/watch?v=dQw4w9WgXcQ
# List supported loaders
uvx kabigon --list
Troubleshooting
- Install
uvifuvxis not found:https://docs.astral.sh/uv/getting-started/installation/
Source
git clone https://github.com/narumiruna/kabigon/blob/main/skills/url-content-loading/SKILL.mdView on GitHub Overview
url-content-loading is a URL content loading tool that extracts text or metadata from URLs across platforms like YouTube, Twitter/X, Reddit, GitHub, PTT, Truth Social, and more. It enables loading content from social posts, videos, documents, or code repositories for scraping, data extraction, or content analysis.
How This Skill Works
The tool uses a modular loader system where you specify a loader (e.g., playwright, httpx, firecrawl, youtube, youtube-ytdlp, ytdlp, twitter, truthsocial, reddit, ptt, reel, github, pdf) to fetch and parse the target URL. You can compose loaders in order to fall back if one cannot extract content, and you can rely on the default pipeline if you’re unsure which loader to use.
When to Use It
- To load YouTube video content, including captions or metadata, for analysis.
- To fetch content from Twitter/X posts or threads for sentiment or data extraction.
- To retrieve content from Reddit threads or comments for analysis.
- To extract repository content from GitHub, such as READMEs or docs.
- To load PDFs or other documents for full-text extraction.
Quick Start
- Step 1: uvx kabigon --loader youtube https://www.youtube.com/watch?v=dQw4w9WgXcQ
- Step 2: uvx kabigon --loader youtube,youtube-ytdlp https://www.youtube.com/watch?v=dQw4w9WgXcQ
- Step 3: Review the extracted text or metadata produced by the tool
Best Practices
- Choose the loader that best matches the target platform (e.g., youtube for videos, pdf for documents).
- Use loader composition (e.g., youtube,youtube-ytdlp) to fall back when captions or text are missing.
- Test with representative URLs to verify whether you get text vs metadata as needed.
- Respect rate limits and platform terms; cache results when possible to avoid repeated loads.
- Validate and sanitize extracted data before downstream processing.
Example Use Cases
- Extract YouTube video title and transcript using youtube or youtube-ytdlp loaders.
- Pull the content of a Reddit thread and top comments via the reddit loader.
- Retrieve a GitHub repo README.md and relevant docs for analysis.
- Load a Twitter/X post’s text and metadata with the twitter loader.
- Extract text from a PDF document using the pdf loader.
Frequently Asked Questions
Add this skill to your agents