summarize
Scannednpx machina-cli add skill buildoak/fieldwork-skills/summarize --openclawSummarize
Extract clean text and media transcripts from URLs, files, and streams so your AI workflow can reason over reliable source content without hand-coding brittle scraper logic.
Use this skill when you need deterministic extraction for YouTube, podcast feeds, PDFs, scanned images, or local media files.
Terminology used in this file:
- DOM: Document Object Model, the page element structure used by browser-based extractors.
- OCR: Optical character recognition (extracting text from images/scans).
- ANSI codes: Terminal color/control sequences;
--plainremoves them for machine parsing.
Setup
brew tap steipete/tap
brew install summarize
- Claude Code: copy this skill folder into
.claude/skills/summarize/ - Codex CLI: append this SKILL.md content to your project's root
AGENTS.md
For the full installation walkthrough (prerequisites, optional dependencies, verification, troubleshooting), see references/installation-guide.md.
Staying Updated
This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.
After installing, tell your agent: "Check UPDATES.md in the summarize skill for any new features or changes."
When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."
Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.
Quick Start
Run one extraction flow end-to-end:
summarize --version
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain
summarize --extract "/path/to/document.pdf" --plain
Use --extract --plain as the default pattern for deterministic, non-ANSI output.
Decision Tree: summarize vs Other Tools
Need content from the web?
|
+-- Static web page (article, docs, blog)?
| --> WebFetch (built-in, zero deps, faster)
| --> Jina r.jina.ai (zero install alternative)
| --> summarize ONLY if above tools fail or return garbage
|
+-- JS-heavy SPA / dynamic content?
| --> Crawl4AI crwl (full browser rendering)
| --> summarize will NOT help here (no JS rendering)
|
+-- Anti-bot / paywalled / Cloudflare-protected?
| --> summarize --firecrawl always (requires FIRECRAWL_API_KEY)
| --> browser-based workflow as fallback
|
+-- YouTube video?
| --> summarize --extract (ONLY option for transcript)
| --> Add --youtube web for captions-only (faster)
| --> Add --slides for visual slide extraction
|
+-- Podcast / RSS feed?
| --> summarize --extract (ONLY option)
| --> Supports Apple Podcasts, Spotify, RSS feeds, Podbean, etc.
|
+-- PDF (URL or local file)?
| --> summarize --extract (ONLY CLI option)
| --> Requires: uvx/markitdown (brew install uv)
|
+-- Image (OCR)?
| --> summarize --extract (ONLY CLI option)
| --> Requires: tesseract
|
+-- Audio / video file?
--> summarize --extract (ONLY CLI option)
--> Requires: whisper-cli (local) or OPENAI_API_KEY (cloud)
Rule of thumb: summarize is the default for media extraction (YouTube, podcasts, audio, video, images). For web pages, prefer WebFetch/Jina/Crawl4AI depending on DOM complexity (how hard the page structure is to parse). Use summarize for web only when other tools fail.
Extraction Mode (Primary)
--extract prints raw extracted content and exits. No LLM involved.
Use this first. You can handle any downstream synthesis in your own workflow.
# Web page extraction (plain text, default)
summarize --extract "https://example.com" --plain
# Web page extraction (markdown format)
summarize --extract "https://example.com" --format md --plain
# YouTube transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain
# YouTube transcript with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain
# YouTube transcript formatted as markdown (requires LLM -- uses API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain
# YouTube slides + transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --slides --plain
# Podcast (RSS feed)
summarize --extract "https://feeds.example.com/podcast.xml" --plain
# Apple Podcasts episode
summarize --extract "https://podcasts.apple.com/us/podcast/EPISODE_ID" --plain
# PDF from URL
summarize --extract "https://example.com/document.pdf" --plain
# PDF from local file
summarize --extract "/path/to/document.pdf" --plain
# Image OCR
summarize --extract "/path/to/image.png" --plain
# Audio transcription
summarize --extract "/path/to/audio.mp3" --plain
# Video transcription
summarize --extract "/path/to/video.mp4" --plain
# Stdin (pipe content)
pbpaste | summarize --extract - --plain
cat document.pdf | summarize --extract - --plain
Always use --plain when extracting for agent consumption. It suppresses ANSI/OSC rendering.
Extraction defaults:
- URLs default to
--format mdin extract mode - Files default to
--format text - PDF requires uvx/markitdown (
--preprocess auto, which is default)
LLM Summarization Mode (Secondary)
Use this mode only when you explicitly want summarize to perform synthesis itself.
# Summarize a URL (requires API key for the chosen model)
summarize "https://example.com" --model anthropic/claude-sonnet-4-5 --length long
# Summarize with a custom prompt
summarize "https://example.com" --prompt "Extract key technical decisions and their rationale"
# Summarize YouTube video
summarize "https://www.youtube.com/watch?v=VIDEO_ID" --length xl
# JSON output with metrics
summarize "https://example.com" --json --model openai/gpt-5-mini
API keys for LLM mode (set in ~/.summarize/config.json or env vars):
ANTHROPIC_API_KEY-- for anthropic/ modelsOPENAI_API_KEY-- for openai/ modelsGEMINI_API_KEY-- for google/ modelsXAI_API_KEY-- for xai/ models
Dependency Matrix
| Feature | Required Deps |
|---|---|
| Web page extraction | None |
| YouTube transcript (captions) | None (web mode) |
| YouTube transcript (no captions) | yt-dlp + whisper or API key |
| YouTube slides | yt-dlp + ffmpeg |
| Podcast transcription | yt-dlp + whisper or API key |
| PDF extraction | uvx/markitdown |
| Image OCR | tesseract |
| Audio/video transcription | whisper-cli (local) or OPENAI_API_KEY |
| Anti-bot sites (Firecrawl) | FIRECRAWL_API_KEY |
| Slide OCR | tesseract |
What is not installed (by design):
whisper-cli/ whisper.cpp -- heavy binary, install when audio transcription is needed- Firecrawl API key -- paid service, configure when anti-bot extraction is needed
- LLM API keys in summarize config -- only add if you use LLM Summarization Mode
Key Flags Quick Reference
| Flag | Purpose | Example |
|---|---|---|
--extract | Raw content extraction, no LLM | summarize --extract URL |
--plain | No ANSI rendering (agent-safe output) | Always use for agents |
--format md|text | Output format (md default for URLs in extract) | --format md |
--youtube auto|web|yt-dlp | YouTube transcript source | --youtube web (captions only) |
--slides | Extract video slides with ffmpeg | --slides --slides-ocr |
--timestamps | Include timestamps in transcripts | --timestamps |
--firecrawl off|auto|always | Firecrawl for anti-bot sites | --firecrawl always |
--preprocess off|auto|always | Preprocessing (markitdown for PDFs) | Default auto |
--markdown-mode | HTML-to-MD conversion mode | --markdown-mode readability |
--timeout | Fetch/LLM timeout | --timeout 2m |
--verbose | Debug output to stderr | Troubleshooting |
--json | Structured JSON output with metrics | --json |
--length | Summary length (LLM mode only) | --length xl |
--model | LLM model (LLM mode only) | --model anthropic/claude-sonnet-4-5 |
--max-extract-characters | Limit extract output length | --max-extract-characters 50000 |
--language|--lang | Output language | --lang en |
--video-mode | Video handling mode | --video-mode transcript |
--transcriber | Audio backend | --transcriber whisper |
Verified Services (YouTube/Podcasts)
YouTube: All public videos with captions. Falls back to yt-dlp audio download + transcription for videos without captions.
Podcasts (verified):
- Apple Podcasts
- Spotify (best-effort; may fail for exclusives)
- Amazon Music / Audible podcast pages
- Podbean
- Podchaser
- RSS feeds (Podcasting 2.0 transcripts when available)
- Embedded YouTube podcast pages
Common Patterns
1. YouTube Transcript for Analysis
# Quick: captions only (fastest, no deps beyond summarize)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --youtube web --plain
# Full: with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain
# Formatted as clean markdown (requires LLM API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain
2. Podcast Episode Transcript
# From RSS feed (transcribes latest episode)
summarize --extract "https://feeds.example.com/podcast.xml" --plain
# From Apple Podcasts link
summarize --extract "https://podcasts.apple.com/us/podcast/SHOW/EPISODE" --plain
3. PDF Content Extraction
# From URL
summarize --extract "https://example.com/report.pdf" --plain
# From local file
summarize --extract "/path/to/file.pdf" --plain
# Limit output length
summarize --extract "/path/to/huge.pdf" --max-extract-characters 50000 --plain
4. Image OCR
summarize --extract "/path/to/screenshot.png" --plain
summarize --extract "/path/to/scanned-doc.jpg" --plain
5. Anti-Bot Website (Firecrawl Fallback)
# Requires FIRECRAWL_API_KEY in env or config
summarize --extract "https://paywalled-site.com/article" --firecrawl always --plain
6. Batch Extraction (Shell Loop)
# Extract multiple YouTube videos
for url in "URL1" "URL2" "URL3"; do
echo "=== $url ==="
summarize --extract "$url" --plain
done
Error Handling
| Symptom | Cause | Fix |
|---|---|---|
Missing uvx/markitdown | PDF preprocessing not available | brew install uv |
does not support extracting binary files | Preprocessing disabled for PDF | Use --preprocess auto (default) with uvx installed |
| YouTube returns empty transcript | No captions available, no yt-dlp/whisper | Install yt-dlp; for whisper fallback, install whisper-cli or set OPENAI_API_KEY |
FIRECRAWL_API_KEY not set | Anti-bot mode requires Firecrawl | Set key in env or ~/.summarize/config.json |
| Timeout on large content | Default 2m timeout too short | Use --timeout 5m |
| Audio transcription fails | No whisper backend available | Install whisper-cli locally or set OPENAI_API_KEY/FAL_KEY |
| Podcast extraction fails | Audio download failed | Check yt-dlp is installed and updated: brew upgrade yt-dlp |
| Garbled web extraction | JS-rendered content | summarize has no JS engine; use Crawl4AI instead |
Configuration
Config file: ~/.summarize/config.json
{
"model": "auto",
"env": {
"FIRECRAWL_API_KEY": "fc-..."
},
"ui": {
"theme": "mono"
}
}
Configure only what your workflow needs. If you use LLM Summarization Mode, add the required API keys.
Anti-Patterns
| Do NOT | Do Instead |
|---|---|
| Use summarize for static web pages | WebFetch or Jina (faster, zero deps) |
| Use summarize for JS-heavy SPAs | Crawl4AI crwl (has browser rendering) |
| Use summarize's LLM mode as default | Use --extract and run synthesis in your own workflow unless explicitly required |
Skip --plain for any non-interactive run | Always use --plain to avoid ANSI escape codes |
| Install whisper.cpp preemptively | Install only when audio transcription use case arises |
Forget --timeout for large media | Podcasts/videos can take minutes; set --timeout 5m |
| Use summarize when WebFetch works | summarize is heavier; reserve for media and fallback |
| Use summarize for local repo/codebase search | Use your local knowledge search tools |
Bundled Resources Index
| Path | What | When to Load |
|---|---|---|
./UPDATES.md | Structured changelog for AI agents | When checking for new features or updates |
./UPDATE-GUIDE.md | Instructions for AI agents performing updates | When updating this skill |
./references/installation-guide.md | Detailed install walkthrough for Claude Code and Codex CLI | First-time setup or environment repair |
./references/commands.md | Full CLI flag reference with all options | When you need exact flag syntax or env var names |
Source
git clone https://github.com/buildoak/fieldwork-skills/blob/main/skills/summarize/SKILL.mdView on GitHub Overview
Summarize extracts clean text and media transcripts from URLs, files, and streams so your AI workflow can reason over reliable source content without brittle scraper logic. It handles YouTube transcripts, podcasts, PDFs, scanned images via OCR, and local media via the summarize CLI.
How This Skill Works
You invoke the CLI with an extraction target. The Extraction Mode prints raw extracted content without using an LLM. It uses built in web, media, and OCR extractors to deliver deterministic text from YouTube, PDFs, images, audio, and video for downstream processing.
When to Use It
- Need a YouTube or podcast transcript to index or summarize content.
- Require deterministic text from PDFs or scanned documents without manual scraping.
- Need OCR converted text from images or scans for data extraction.
- Work with RSS feeds or audio/video files and want raw transcripts or text.
- Prefer a predictable extraction path before downstream AI reasoning.
Quick Start
- Step 1: Install the CLI with brew tap steipete/tap and brew install summarize.
- Step 2: Extract a YouTube transcript with summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain.
- Step 3: Extract a PDF with summarize --extract "/path/to/document.pdf" --plain.
Best Practices
- Start with --extract to get raw content before any synthesis.
- Use --plain to strip ANSI codes for machine parsing.
- If formatting helps downstream tooling, add --format md or other formats.
- Install required dependencies for OCR and PDF parsing (eg tesseract, uvx/markitdown, whisper-cli).
- Validate content length and quality before feeding into models or pipelines.
Example Use Cases
- Extract a YouTube transcript to build course notes or a study guide.
- Pull text from a PDF manual to populate a product knowledge base.
- OCR scanned invoices and convert them to searchable text for accounting.
- Fetch transcripts from podcast RSS feeds for search indexing.
- Integrate summarize into a CI workflow to prefetch and cache web or media content.