Do I need to manually curate transcripts before clipping?

Yes. Auto-generated captions lack punctuation and complete sentences. You must ensure the first and last sentences are complete, format paragraphs, and add punctuation before finalizing clips.

What dependencies are required to run this skill?

You need yt-dlp to fetch videos and ffmpeg to trim and assemble clips. The workflow also uses Python scripts like parse_vtt.py, extract_transcript.py, and download_clip.py.

How are clips selected and how customizable is the output?

Clips are chosen from high-value segments based on controversy, insights, or user-specified topics. You can set the number of clips (e.g., 3–5) and min/max durations to tailor the output to your needs.

get-y2b-clips

npx machina-cli add skill DidierRLopes/get-y2b-clips/get-y2b-clips --openclaw

Files (1)

SKILL.md

21.2 KB

YouTube Nuggets Extractor

Extract the most valuable clips ("nuggets") from YouTube videos automatically. Analyzes transcripts to find high-value segments based on controversy, insightful analysis, or user-specified topics.

When to Use This Skill

Activate when the user:

Wants to extract "best clips", "highlights", or "nuggets" from a YouTube video
Asks to find "interesting moments" or "valuable segments"
Wants controversial takes, insights, or specific topics from a video
Provides a YouTube URL and mentions clips, segments, or highlights

Dependencies Check

ALWAYS check dependencies first:

# Check for yt-dlp
command -v yt-dlp || echo "MISSING: yt-dlp"

# Check for ffmpeg
command -v ffmpeg || echo "MISSING: ffmpeg"

Install Missing Dependencies

yt-dlp:

# macOS
brew install yt-dlp

# Linux
sudo apt update && sudo apt install -y yt-dlp

# pip (universal)
pip3 install yt-dlp

ffmpeg:

# macOS
brew install ffmpeg

# Linux
sudo apt update && sudo apt install -y ffmpeg

Input Requirements

Required: YouTube URL
Optional (ask user if not specified for long videos >30 min):
- Number of clips (default: 3-5 based on video length)
- Topic focus keywords
- Min/max clip duration (default: 30s-180s)

Available Python Scripts

The skill includes helper scripts in the .claude/skills/get-y2b-clips/ directory:

Script	Purpose
`parse_vtt.py`	Parse VTT subtitles into segments.json (cleans HTML entities)
`extract_transcript.py`	Extract transcript with auto sentence boundary detection
`download_clip.py`	Download video clip with retry logic and progress reporting
`burn_subtitles.py`	Generate subtitled video with hardcoded captions
`utils.py`	Shared utilities for timestamp parsing

Transcript Curation (CRITICAL)

Auto-generated YouTube captions lack punctuation. The transcript must be manually curated to ensure:

Complete starting sentence: Must begin with a coherent thought, not mid-sentence
Complete ending sentence: Must end with a complete thought, not cut off
Proper formatting: Sentences on separate lines with blank lines between
Punctuation added: Add periods, commas, question marks as needed

Workflow: Transcript → Video (Not the reverse!)

1. Identify target timestamps (where the insight is)
2. Run extract_transcript.py to get raw extraction + suggested video timestamps
3. MANUALLY CURATE the transcript:
   - Ensure first sentence is complete (may need to trim start)
   - Ensure last sentence is complete (may need to extend/trim end)
   - Add punctuation and formatting
   - Split into readable paragraphs
4. Use the VIDEO_START and VIDEO_END from script output
   - Video should start ~2s BEFORE first word of transcript
   - Video should end ~2s AFTER last word of transcript
5. Download video using those curated timestamps

Example Curation:

Raw extraction (bad):

Successful why do you think we're learned and it turns out that many or most of the people in The Venture business...

Curated (good):

It turns out that many or most of the people in the venture business historically would answer that question by telling you they finance the best and brightest, the greatest managers.

We do not.

We have always focused on the market - the size of the market, the dynamics of the market, the nature of the competition.

Because our objective always was to build big companies.

If you don't attack a big market, it's highly unlikely you're ever going to build a big company.

Using extract_transcript.py

# Run the script to get raw extraction and video timestamps
python3 extract_transcript.py \
    --start 00:04:00 \        # Target start (where insight begins)
    --end 00:05:08 \          # Target end (where insight ends)
    --title "Clip Title" \
    --source "Video Title" \
    --output "clip_folder/Transcript.txt" \
    --json                     # Also outputs JSON with timestamps

# Output will show:
#   VIDEO_START=00:03:58      <- Use this for video download
#   VIDEO_END=00:05:10        <- Use this for video download

Then manually edit the Transcript.txt file to curate the text before downloading the video.

Console Progress Reporting

IMPORTANT: Provide clear progress updates to the user at each stage:

[SETUP] Fetching video info...
  ✓ Video: "Title Here" (32 min)
  ✓ Output: ./clips/2024-01-01_12-00-00_video-slug/

[TRANSCRIPT] Downloading subtitles...
  ✓ Auto-generated English subtitles found
  ✓ Parsed 912 segments

[ANALYSIS] Identifying best clips...
  ✓ Found 5 high-value segments
  ✓ Selected top 2 clips

[CLIP 1/2] "Factory is the Weapon" (08:56 - 10:31)
  ✓ metadata.json created
  ✓ Transcript.txt extracted (321 words)
  ✓ Video.mp4 downloaded (14.1 MB)
  ✓ Subtitled.mp4 created (13.8 MB)

[CLIP 2/2] "Peter Thiel Always Right" (29:59 - 31:17)
  ✓ metadata.json created
  ✓ Transcript.txt extracted (307 words)
  ⚠ Download failed (403), retrying...
  ✓ Video.mp4 downloaded (10.4 MB)
  ✓ Subtitled.mp4 created (10.1 MB)

[DONE] Extracted 2 clips (2m53s total)

Retry Logic for Downloads

YouTube occasionally returns 403 errors. Always implement retry logic:

# Use the download_clip.py script with built-in retries
python3 .claude/skills/get-y2b-clips/download_clip.py \
    --url "$VIDEO_URL" \
    --start "00:08:56" \
    --end "00:10:31" \
    --output "$CLIP_DIR/Video.mp4" \
    --retries 3

Or implement inline:

import time
import subprocess

def download_with_retry(cmd, max_retries=3, delay=2):
    for attempt in range(max_retries):
        result = subprocess.run(cmd)
        if result.returncode == 0:
            return True
        print(f"  ⚠ Retry {attempt + 1}/{max_retries}...")
        time.sleep(delay)
    return False

Complete Workflow

CRITICAL: The order of operations is WHY → TRANSCRIPT → VIDEO

The "Why" justifies the selection, the transcript defines the EXACT timestamps, and the video is downloaded to match those exact timestamps.

Phase 1: Setup

# Get video info
VIDEO_URL="USER_PROVIDED_URL"
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/:?*"<>|\\' '-')
VIDEO_DURATION=$(yt-dlp --print "%(duration)s" "$VIDEO_URL")
VIDEO_ID=$(yt-dlp --print "%(id)s" "$VIDEO_URL")

echo "Video: $VIDEO_TITLE"
echo "Duration: $((VIDEO_DURATION / 60)) minutes"

# Create output folder
TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")
SLUG=$(echo "$VIDEO_TITLE" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | cut -c1-50)
OUTPUT_DIR="./clips/${TIMESTAMP}_${SLUG}"
mkdir -p "$OUTPUT_DIR"

echo "Output folder: $OUTPUT_DIR"

Phase 2: Get Transcript with Exact Timestamps

Priority order: Manual subtitles → Auto-generated → Whisper

cd "$OUTPUT_DIR"

# Check available subtitles
yt-dlp --list-subs "$VIDEO_URL"

# Try manual subtitles first
if yt-dlp --write-sub --sub-langs "en" --skip-download -o "transcript" "$VIDEO_URL" 2>/dev/null; then
    echo "Manual subtitles downloaded"
elif yt-dlp --write-auto-sub --sub-langs "en" --skip-download -o "transcript" "$VIDEO_URL" 2>/dev/null; then
    echo "Auto-generated subtitles downloaded"
else
    echo "No subtitles available - Whisper transcription required"
    # Ask user before proceeding with Whisper (downloads audio)
fi

Parse VTT using the skill's Python script:

# Parse VTT file into segments.json and full_transcript.txt
python3 .claude/skills/get-y2b-clips/parse_vtt.py transcript.en.vtt

# This creates:
#   - segments.json (for precise timestamp lookup)
#   - full_transcript.txt (for reading/analysis)

Alternative inline Python (if script not available):

import re
import json

def parse_vtt(filename):
    with open(filename, 'r', encoding='utf-8') as f:
        content = f.read()

    lines = content.split('\n')
    segments = []
    current_start = None
    current_end = None
    seen_text = set()

    for line in lines:
        line = line.strip()

        # Check if this is a timestamp line
        time_match = re.match(r'^(\d{2}:\d{2}:\d{2})\.(\d{3}) --> (\d{2}:\d{2}:\d{2})\.(\d{3})', line)
        if time_match:
            current_start = f"{time_match.group(1)}.{time_match.group(2)}"
            current_end = f"{time_match.group(3)}.{time_match.group(4)}"
            continue

        # Skip metadata and empty lines
        if not line or line.startswith('WEBVTT') or line.startswith('Kind:') or line.startswith('Language:'):
            continue

        # Skip lines with tags (word-by-word breakdowns)
        if '<' in line:
            continue

        # This is a clean text line
        text = line.strip()
        if text and text not in seen_text and current_start:
            seen_text.add(text)
            segments.append({
                'start': current_start,
                'end': current_end,
                'text': text
            })

    return segments

segments = parse_vtt("transcript.en.vtt")

# Write full transcript with timestamps
with open('full_transcript.txt', 'w') as f:
    for seg in segments:
        f.write(f"[{seg['start']}] {seg['text']}\n")

# Write segments JSON for precise timestamp lookup
with open('segments.json', 'w') as f:
    json.dump(segments, f, indent=2)

print(f"Parsed {len(segments)} segments with exact timestamps")

Phase 3: Analyze Content & Generate "Why" FIRST

This is the critical phase - identify segments and justify selection BEFORE extracting.

Read full_transcript.txt and analyze using these scoring criteria:

Controversy Signals (weight: 0.30)
- "I disagree", "controversial", "unpopular opinion"
- Strong language: "absolutely", "never", "always"
- Debate markers: "push back", "challenge that"
Insight Signals (weight: 0.35)
- Statistics, data points, percentages
- Predictions: "will happen", "in X years"
- Frameworks: "the way I see it", "my model"
- Expert knowledge, technical depth
Engagement Signals (weight: 0.20)
- Rhetorical questions
- Stories: "let me tell you", "for example"
- Emotional peaks, emphasis
- Direct address: "think about it"
Topic Match (weight: 0.15, or 0.40 if user specified topics)
- Keyword presence
- Semantic relevance

Analysis output - use EXACT timestamps from transcript:

For each identified clip, record:

Title: Short descriptive name
Start timestamp: EXACT timestamp from first line of segment (from segments.json)
End timestamp: EXACT timestamp from last line of segment (from segments.json)
Why: Full justification with scores

IMPORTANT: The start and end times MUST come from the transcript timestamps. Do not approximate or round. The video will be cut to match these exact times.

Phase 4: For Each Clip - Create Files in Order

Order: metadata.json → Transcript.txt → Video.mp4 → Subtitled.mp4

Step 1: Create metadata.json (combines "why" + transcript info)

import json

clip_metadata = {
    "title": "Clip Title",
    "source_video": "Video Title",
    "video_start": "00:12:06.000",
    "video_end": "00:14:05.000",
    "duration_seconds": 119,
    "word_count": 321,
    "transcript": "Full transcript text with proper formatting...",
    "selection_rationale": {
        "controversy": {
            "score": 8,
            "reason": "Explanation of controversy signals found"
        },
        "insight": {
            "score": 9,
            "reason": "Key insights delivered"
        },
        "engagement": {
            "score": 7,
            "reason": "Engagement signals found"
        },
        "relevance": {
            "score": 8,
            "reason": "How it relates to the main topic"
        }
    },
    "actionable_takeaway": "What viewers/investors should do with this information"
}

with open('Clip Title metadata.json', 'w') as f:
    json.dump(clip_metadata, f, indent=2)

Step 2: Create Transcript.txt (human-readable version)

Extract transcript text with a 5-second buffer before and after the video timestamps. This ensures all spoken words in the video clip are captured in the transcript (accounting for keyframe cuts).

Key rules:

Clean text only - no timestamps in the output
5-second buffer - transcript covers slightly more than the video
Proper formatting - capitalize first letter of sentences, new line for each sentence
Readable flow - sentences separated by blank lines for easy reading

Formatting the transcript:

Join all segment text together
Split on sentence boundaries (. ! ?)
Capitalize first letter of each sentence
Write each sentence on its own line with blank line between

import json
import re

# Load segments
with open('segments.json', 'r') as f:
    segments = json.load(f)

# Define VIDEO boundaries (what will be downloaded)
video_start = "00:12:06.000"
video_end = "00:14:05.000"

# Define TRANSCRIPT boundaries (5-second buffer)
transcript_start = "00:12:01.000"  # 5s before video
transcript_end = "00:14:10.000"    # 5s after video

# Extract matching segments
clip_text = []
for seg in segments:
    if seg['start'] >= transcript_start and seg['start'] <= transcript_end:
        clip_text.append(seg['text'])

# Join and format nicely
raw_text = ' '.join(clip_text)

# Split into sentences and format
sentences = re.split(r'(?<=[.!?])\s+', raw_text)
formatted_sentences = []
for s in sentences:
    s = s.strip()
    if s:
        # Capitalize first letter
        s = s[0].upper() + s[1:] if len(s) > 1 else s.upper()
        formatted_sentences.append(s)

# Write clean, formatted transcript
with open('Clip_Title Transcript.txt', 'w') as f:
    f.write(f"Clip Title - Transcript\n")
    f.write(f"Source: Video Title\n")
    f.write(f"Video: {video_start[:8]} - {video_end[:8]}\n\n---\n\n")
    f.write('\n\n'.join(formatted_sentences))  # Each sentence on new line

Step 3: Download Video using EXACT timestamps

CRITICAL: Use the same timestamps from the transcript extraction.

START_TIME="00:12:06"  # Must match transcript start
END_TIME="00:14:05"    # Must match transcript end
CLIP_TITLE="Clip Title"
SAFE_TITLE=$(echo "$CLIP_TITLE" | tr '/:?*"<>|\\' '-')

# Create clip folder
CLIP_DIR="$OUTPUT_DIR/01_$(echo "$SAFE_TITLE" | tr '[:upper:]' '[:lower:]' | tr ' ' '_' | cut -c1-40)"
mkdir -p "$CLIP_DIR"

# Download video clip with EXACT timestamps
yt-dlp -f 'bestvideo[height<=1080]+bestaudio/best[height<=1080]' \
  --download-sections "*${START_TIME}-${END_TIME}" \
  --force-keyframes-at-cuts \
  --merge-output-format mp4 \
  -o "$CLIP_DIR/${SAFE_TITLE} Video.%(ext)s" \
  "$VIDEO_URL"

Step 4: Create Subtitled Video

ALWAYS create a subtitled version for social media sharing. Uses ffmpeg to burn subtitles directly into the video with styled captions.

# Use the burn_subtitles.py script
python3 .claude/skills/get-y2b-clips/burn_subtitles.py \
    --video "$CLIP_DIR/${SAFE_TITLE} Video.mp4" \
    --segments segments.json \
    --start "$START_TIME" \
    --end "$END_TIME" \
    --output "$CLIP_DIR/${SAFE_TITLE} Subtitled.mp4"

What it does:

Reads segments.json for accurate word-level timing
Generates SRT subtitles for the clip's time range
Burns subtitles into video using ffmpeg with styling:
- Large readable font (24pt)
- Semi-transparent black background
- White text with black outline
- Bottom-center positioning

Subtitle options:

# Custom font size (default: 24)
--font-size 28

# Keep the SRT file (for external use)
--keep-srt

# When video has buffer before speech starts (sync fix)
--transcript-start "00:31:38.350"

# When video has buffer after speech ends
--transcript-end "00:32:50.000"

Important: If the video starts a few seconds before the actual speech, use --transcript-start to specify when the transcript content begins. This prevents showing subtitles for content before the clip's main topic.

Phase 5: Generate Metadata

{
  "source": {
    "url": "VIDEO_URL",
    "title": "VIDEO_TITLE",
    "video_id": "VIDEO_ID",
    "duration_seconds": DURATION
  },
  "extraction": {
    "date": "ISO_TIMESTAMP",
    "output_dir": "OUTPUT_DIR",
    "clip_count": N,
    "topic_filter": null
  },
  "clips": [
    {
      "index": 1,
      "folder": "01_clip_slug",
      "title": "Clip Title",
      "start_time": "00:12:06.000",
      "end_time": "00:14:05.000",
      "duration_seconds": 119,
      "scores": {
        "controversy": 0.9,
        "insight": 0.9,
        "engagement": 0.8,
        "overall": 0.87
      },
      "summary": "Brief description"
    }
  ]
}

Phase 6: Summary

Display to user:

Number of clips extracted
Total clip duration
List of clips with titles and EXACT timestamps
Output folder location

Timestamp Alignment Rules

These rules ensure video matches transcript exactly:

Source of truth: The VTT transcript timestamps are the source of truth
No rounding: Use timestamps exactly as they appear in segments.json
Verify alignment: The first and last words in Transcript.txt should match the first and last words spoken in Video.mp4
Buffer if needed: If a sentence is cut mid-word, extend to the next segment boundary

Clip Duration Guidelines

Video Length	Recommended Clips	Default Clip Length
< 15 min	1-2	30-60 seconds
15-30 min	2-3	45-90 seconds
30-60 min	3-4	60-120 seconds
1-2 hours	4-5	90-180 seconds
> 2 hours	5-7	90-180 seconds

Interactive Flow (Long Videos)

For videos > 30 minutes without user guidance, ask:

I found a [X] minute video. How would you like to proceed?

A) Extract top 5 clips automatically (recommended)
B) Focus on specific topics - please specify keywords
C) Extract more clips - specify how many
D) Let me scan the transcript first and suggest topics

Error Handling

Issue	Solution
No subtitles	Offer Whisper with audio size warning
ffmpeg missing	Provide install command
Clip download fails	Retry with simpler format `-f best`
Private video	Inform user, cannot proceed
Very short video	Suggest 1 clip or full download
Timestamp mismatch	Re-verify against segments.json

Output File Naming

Folder: YYYY-MM-DD_HH-MM-SS_<video-slug>/
Clips: NN_<clip-slug>/
Files per clip:
- <Title> metadata.json - Complete clip info (transcript, timestamps, why selected)
- <Title> Transcript.txt - Human-readable transcript with formatting
- <Title> Video.mp4 - Raw video clip
- <Title> Subtitled.mp4 - Video with burned-in captions (for social media)

Per-Clip metadata.json Structure

Each clip folder contains a metadata.json with all information combined:

{
  "title": "This Is NOT a Bubble",
  "source_video": "Bitcoin Fear Hits All-Time High",
  "video_url": "https://www.youtube.com/watch?v=VIDEO_ID&t=1896s",
  "video_start": "00:31:36.000",
  "video_end": "00:32:54.000",
  "transcript_start": "00:31:38.350",
  "duration_seconds": 78,
  "word_count": 218,
  "selection_rationale": {
    "controversy": {
      "score": 10,
      "reason": "Directly calls out Michael Burr and Jeff Gundlach as wrong about AI bubble"
    },
    "insight": {
      "score": 9,
      "reason": "Data-driven comparison: NDX 800% vs 100%, Nvidia $60B quarterly revenue"
    },
    "engagement": {
      "score": 9,
      "reason": "Strong emotional conviction, direct callout of famous contrarians"
    },
    "relevance": {
      "score": 9,
      "reason": "Directly addresses fear sentiment from video title"
    }
  },
  "actionable_takeaway": "Don't conflate extreme fear with bubble conditions. Look at earnings growth and historical comparisons.",
  "transcript": "Full transcript text here (always last key for readability)..."
}

Notes:

video_url: Direct link to the clip's start time in the original video (uses &t=XXXs parameter)
transcript_start: Used when video has a buffer before speech starts (ensures subtitle sync)

Example Session

User: Extract the best clips from https://www.youtube.com/watch?v=abc123

Claude:

Checks dependencies (yt-dlp, ffmpeg)
Gets video info: "AI Future Podcast - 1:45:00"
Downloads transcript with exact timestamps
Analyzes transcript, identifies top segments
For each clip:
- Creates metadata.json (transcript + selection rationale + scores)
- Creates Transcript.txt (human-readable version)
- Downloads video using exact timestamps
- Creates subtitled version with burned-in captions
Returns summary with folder location

User: Get 3 clips about "startup funding" from https://youtube.com/watch?v=xyz789

Claude:

Same setup
Filters transcript for "startup", "funding", "invest", "raise" keywords
Scores segments with topic_match weighted higher
For each of 3 clips: metadata.json → Transcript.txt → Video → Subtitled Video
Returns summary

Source

git clone https://github.com/DidierRLopes/get-y2b-clips/blob/main/.claude/skills/get-y2b-clips/SKILL.mdView on GitHub

Overview

YouTube Nuggets Extractor automatically identifies high-value moments from video transcripts to create the most meaningful clips. It targets highlights, controversial takes, or topic-focused segments, and supports specifying how many clips to pull. The workflow relies on yt-dlp and ffmpeg and uses a suite of helper scripts to fetch, parse, and assemble clips.

How This Skill Works

The skill first checks for core dependencies (yt-dlp and ffmpeg). It then downloads the video and extracts a transcript with extract_transcript.py, followed by targeted transcript curation to ensure clean, punctuated sentences. Using identified timestamps, download_clip.py fetches the chosen segments, which can be bundled with burn_subtitles.py if subtitled clips are needed.

When to Use It

You want highlights or the best moments from a long interview or keynote.
You want controversial takes, sharp insights, or standout comments from a video.
You need clips focused on specific topics or keywords specified by the user.
You’re creating social clips, promos, or teaser reels from longer videos.
You have a YouTube URL and want a predefined number of clips (e.g., 3–5) based on video length.

Quick Start

Step 1: Provide the YouTube URL and optional preferences (number of clips, topic keywords, min/max duration).
Step 2: Run extract_transcript.py to generate transcripts and suggested clip timestamps.
Step 3: Manually curate the transcript for complete sentences and formatting, then run download_clip.py to fetch the clips (optionally burn_subtitles.py for subtitled outputs).

Best Practices

Always run the dependency check first to confirm yt-dlp and ffmpeg are installed.
Provide clear topic focus keywords to steer the nugget selection.
Ensure an explicit YouTube URL is supplied as input.
Follow the Transcript → Video workflow: manually curate the transcript for complete sentences, proper punctuation, and readable paragraphs.
Use the provided step-by-step timestamps to start the video ~2s before the first word and end ~2s after the last word for clean clips.

Example Use Cases

A podcast episode is summarized into 4 clips highlighting the host’s most provocative take and a key data point.
A product founder extracts 3 clips about a controversial feature decision for social media trailer.
An educational channel pulls concise explanations from a long lecture to create bite-sized study clips.
A marketing team builds teaser reels from a panel discussion by focusing on moments with peak audience engagement.
A reviewer compiles the most insightful lines from a tech talk to generate shareable snippets.

Frequently Asked Questions

Add this skill to your agents