Get the FREE Ultimate OpenClaw Setup Guide →

summarize

Scanned
npx machina-cli add skill buildoak/fieldwork-skills/summarize --openclaw
Files (1)
SKILL.md
12.8 KB

Summarize

Extract clean text and media transcripts from URLs, files, and streams so your AI workflow can reason over reliable source content without hand-coding brittle scraper logic.

Use this skill when you need deterministic extraction for YouTube, podcast feeds, PDFs, scanned images, or local media files.

Terminology used in this file:

  • DOM: Document Object Model, the page element structure used by browser-based extractors.
  • OCR: Optical character recognition (extracting text from images/scans).
  • ANSI codes: Terminal color/control sequences; --plain removes them for machine parsing.

Setup

brew tap steipete/tap
brew install summarize
  • Claude Code: copy this skill folder into .claude/skills/summarize/
  • Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, optional dependencies, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the summarize skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.


Quick Start

Run one extraction flow end-to-end:

summarize --version
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain
summarize --extract "/path/to/document.pdf" --plain

Use --extract --plain as the default pattern for deterministic, non-ANSI output.

Decision Tree: summarize vs Other Tools

Need content from the web?
  |
  +-- Static web page (article, docs, blog)?
  |     --> WebFetch (built-in, zero deps, faster)
  |     --> Jina r.jina.ai (zero install alternative)
  |     --> summarize ONLY if above tools fail or return garbage
  |
  +-- JS-heavy SPA / dynamic content?
  |     --> Crawl4AI crwl (full browser rendering)
  |     --> summarize will NOT help here (no JS rendering)
  |
  +-- Anti-bot / paywalled / Cloudflare-protected?
  |     --> summarize --firecrawl always (requires FIRECRAWL_API_KEY)
  |     --> browser-based workflow as fallback
  |
  +-- YouTube video?
  |     --> summarize --extract (ONLY option for transcript)
  |     --> Add --youtube web for captions-only (faster)
  |     --> Add --slides for visual slide extraction
  |
  +-- Podcast / RSS feed?
  |     --> summarize --extract (ONLY option)
  |     --> Supports Apple Podcasts, Spotify, RSS feeds, Podbean, etc.
  |
  +-- PDF (URL or local file)?
  |     --> summarize --extract (ONLY CLI option)
  |     --> Requires: uvx/markitdown (brew install uv)
  |
  +-- Image (OCR)?
  |     --> summarize --extract (ONLY CLI option)
  |     --> Requires: tesseract
  |
  +-- Audio / video file?
        --> summarize --extract (ONLY CLI option)
        --> Requires: whisper-cli (local) or OPENAI_API_KEY (cloud)

Rule of thumb: summarize is the default for media extraction (YouTube, podcasts, audio, video, images). For web pages, prefer WebFetch/Jina/Crawl4AI depending on DOM complexity (how hard the page structure is to parse). Use summarize for web only when other tools fail.

Extraction Mode (Primary)

--extract prints raw extracted content and exits. No LLM involved. Use this first. You can handle any downstream synthesis in your own workflow.

# Web page extraction (plain text, default)
summarize --extract "https://example.com" --plain

# Web page extraction (markdown format)
summarize --extract "https://example.com" --format md --plain

# YouTube transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain

# YouTube transcript with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain

# YouTube transcript formatted as markdown (requires LLM -- uses API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain

# YouTube slides + transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --slides --plain

# Podcast (RSS feed)
summarize --extract "https://feeds.example.com/podcast.xml" --plain

# Apple Podcasts episode
summarize --extract "https://podcasts.apple.com/us/podcast/EPISODE_ID" --plain

# PDF from URL
summarize --extract "https://example.com/document.pdf" --plain

# PDF from local file
summarize --extract "/path/to/document.pdf" --plain

# Image OCR
summarize --extract "/path/to/image.png" --plain

# Audio transcription
summarize --extract "/path/to/audio.mp3" --plain

# Video transcription
summarize --extract "/path/to/video.mp4" --plain

# Stdin (pipe content)
pbpaste | summarize --extract - --plain
cat document.pdf | summarize --extract - --plain

Always use --plain when extracting for agent consumption. It suppresses ANSI/OSC rendering.

Extraction defaults:

  • URLs default to --format md in extract mode
  • Files default to --format text
  • PDF requires uvx/markitdown (--preprocess auto, which is default)

LLM Summarization Mode (Secondary)

Use this mode only when you explicitly want summarize to perform synthesis itself.

# Summarize a URL (requires API key for the chosen model)
summarize "https://example.com" --model anthropic/claude-sonnet-4-5 --length long

# Summarize with a custom prompt
summarize "https://example.com" --prompt "Extract key technical decisions and their rationale"

# Summarize YouTube video
summarize "https://www.youtube.com/watch?v=VIDEO_ID" --length xl

# JSON output with metrics
summarize "https://example.com" --json --model openai/gpt-5-mini

API keys for LLM mode (set in ~/.summarize/config.json or env vars):

  • ANTHROPIC_API_KEY -- for anthropic/ models
  • OPENAI_API_KEY -- for openai/ models
  • GEMINI_API_KEY -- for google/ models
  • XAI_API_KEY -- for xai/ models

Dependency Matrix

FeatureRequired Deps
Web page extractionNone
YouTube transcript (captions)None (web mode)
YouTube transcript (no captions)yt-dlp + whisper or API key
YouTube slidesyt-dlp + ffmpeg
Podcast transcriptionyt-dlp + whisper or API key
PDF extractionuvx/markitdown
Image OCRtesseract
Audio/video transcriptionwhisper-cli (local) or OPENAI_API_KEY
Anti-bot sites (Firecrawl)FIRECRAWL_API_KEY
Slide OCRtesseract

What is not installed (by design):

  • whisper-cli / whisper.cpp -- heavy binary, install when audio transcription is needed
  • Firecrawl API key -- paid service, configure when anti-bot extraction is needed
  • LLM API keys in summarize config -- only add if you use LLM Summarization Mode

Key Flags Quick Reference

FlagPurposeExample
--extractRaw content extraction, no LLMsummarize --extract URL
--plainNo ANSI rendering (agent-safe output)Always use for agents
--format md|textOutput format (md default for URLs in extract)--format md
--youtube auto|web|yt-dlpYouTube transcript source--youtube web (captions only)
--slidesExtract video slides with ffmpeg--slides --slides-ocr
--timestampsInclude timestamps in transcripts--timestamps
--firecrawl off|auto|alwaysFirecrawl for anti-bot sites--firecrawl always
--preprocess off|auto|alwaysPreprocessing (markitdown for PDFs)Default auto
--markdown-modeHTML-to-MD conversion mode--markdown-mode readability
--timeoutFetch/LLM timeout--timeout 2m
--verboseDebug output to stderrTroubleshooting
--jsonStructured JSON output with metrics--json
--lengthSummary length (LLM mode only)--length xl
--modelLLM model (LLM mode only)--model anthropic/claude-sonnet-4-5
--max-extract-charactersLimit extract output length--max-extract-characters 50000
--language|--langOutput language--lang en
--video-modeVideo handling mode--video-mode transcript
--transcriberAudio backend--transcriber whisper

Verified Services (YouTube/Podcasts)

YouTube: All public videos with captions. Falls back to yt-dlp audio download + transcription for videos without captions.

Podcasts (verified):

  • Apple Podcasts
  • Spotify (best-effort; may fail for exclusives)
  • Amazon Music / Audible podcast pages
  • Podbean
  • Podchaser
  • RSS feeds (Podcasting 2.0 transcripts when available)
  • Embedded YouTube podcast pages

Common Patterns

1. YouTube Transcript for Analysis

# Quick: captions only (fastest, no deps beyond summarize)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --youtube web --plain

# Full: with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain

# Formatted as clean markdown (requires LLM API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain

2. Podcast Episode Transcript

# From RSS feed (transcribes latest episode)
summarize --extract "https://feeds.example.com/podcast.xml" --plain

# From Apple Podcasts link
summarize --extract "https://podcasts.apple.com/us/podcast/SHOW/EPISODE" --plain

3. PDF Content Extraction

# From URL
summarize --extract "https://example.com/report.pdf" --plain

# From local file
summarize --extract "/path/to/file.pdf" --plain

# Limit output length
summarize --extract "/path/to/huge.pdf" --max-extract-characters 50000 --plain

4. Image OCR

summarize --extract "/path/to/screenshot.png" --plain
summarize --extract "/path/to/scanned-doc.jpg" --plain

5. Anti-Bot Website (Firecrawl Fallback)

# Requires FIRECRAWL_API_KEY in env or config
summarize --extract "https://paywalled-site.com/article" --firecrawl always --plain

6. Batch Extraction (Shell Loop)

# Extract multiple YouTube videos
for url in "URL1" "URL2" "URL3"; do
  echo "=== $url ==="
  summarize --extract "$url" --plain
done

Error Handling

SymptomCauseFix
Missing uvx/markitdownPDF preprocessing not availablebrew install uv
does not support extracting binary filesPreprocessing disabled for PDFUse --preprocess auto (default) with uvx installed
YouTube returns empty transcriptNo captions available, no yt-dlp/whisperInstall yt-dlp; for whisper fallback, install whisper-cli or set OPENAI_API_KEY
FIRECRAWL_API_KEY not setAnti-bot mode requires FirecrawlSet key in env or ~/.summarize/config.json
Timeout on large contentDefault 2m timeout too shortUse --timeout 5m
Audio transcription failsNo whisper backend availableInstall whisper-cli locally or set OPENAI_API_KEY/FAL_KEY
Podcast extraction failsAudio download failedCheck yt-dlp is installed and updated: brew upgrade yt-dlp
Garbled web extractionJS-rendered contentsummarize has no JS engine; use Crawl4AI instead

Configuration

Config file: ~/.summarize/config.json

{
  "model": "auto",
  "env": {
    "FIRECRAWL_API_KEY": "fc-..."
  },
  "ui": {
    "theme": "mono"
  }
}

Configure only what your workflow needs. If you use LLM Summarization Mode, add the required API keys.

Anti-Patterns

Do NOTDo Instead
Use summarize for static web pagesWebFetch or Jina (faster, zero deps)
Use summarize for JS-heavy SPAsCrawl4AI crwl (has browser rendering)
Use summarize's LLM mode as defaultUse --extract and run synthesis in your own workflow unless explicitly required
Skip --plain for any non-interactive runAlways use --plain to avoid ANSI escape codes
Install whisper.cpp preemptivelyInstall only when audio transcription use case arises
Forget --timeout for large mediaPodcasts/videos can take minutes; set --timeout 5m
Use summarize when WebFetch workssummarize is heavier; reserve for media and fallback
Use summarize for local repo/codebase searchUse your local knowledge search tools

Bundled Resources Index

PathWhatWhen to Load
./UPDATES.mdStructured changelog for AI agentsWhen checking for new features or updates
./UPDATE-GUIDE.mdInstructions for AI agents performing updatesWhen updating this skill
./references/installation-guide.mdDetailed install walkthrough for Claude Code and Codex CLIFirst-time setup or environment repair
./references/commands.mdFull CLI flag reference with all optionsWhen you need exact flag syntax or env var names

Source

git clone https://github.com/buildoak/fieldwork-skills/blob/main/skills/summarize/SKILL.mdView on GitHub

Overview

Summarize extracts clean text and media transcripts from URLs, files, and streams so your AI workflow can reason over reliable source content without brittle scraper logic. It handles YouTube transcripts, podcasts, PDFs, scanned images via OCR, and local media via the summarize CLI.

How This Skill Works

You invoke the CLI with an extraction target. The Extraction Mode prints raw extracted content without using an LLM. It uses built in web, media, and OCR extractors to deliver deterministic text from YouTube, PDFs, images, audio, and video for downstream processing.

When to Use It

  • Need a YouTube or podcast transcript to index or summarize content.
  • Require deterministic text from PDFs or scanned documents without manual scraping.
  • Need OCR converted text from images or scans for data extraction.
  • Work with RSS feeds or audio/video files and want raw transcripts or text.
  • Prefer a predictable extraction path before downstream AI reasoning.

Quick Start

  1. Step 1: Install the CLI with brew tap steipete/tap and brew install summarize.
  2. Step 2: Extract a YouTube transcript with summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain.
  3. Step 3: Extract a PDF with summarize --extract "/path/to/document.pdf" --plain.

Best Practices

  • Start with --extract to get raw content before any synthesis.
  • Use --plain to strip ANSI codes for machine parsing.
  • If formatting helps downstream tooling, add --format md or other formats.
  • Install required dependencies for OCR and PDF parsing (eg tesseract, uvx/markitdown, whisper-cli).
  • Validate content length and quality before feeding into models or pipelines.

Example Use Cases

  • Extract a YouTube transcript to build course notes or a study guide.
  • Pull text from a PDF manual to populate a product knowledge base.
  • OCR scanned invoices and convert them to searchable text for accounting.
  • Fetch transcripts from podcast RSS feeds for search indexing.
  • Integrate summarize into a CI workflow to prefetch and cache web or media content.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers