Get the FREE Ultimate OpenClaw Setup Guide →

transcribe-video

Scanned
npx machina-cli add skill rameerez/claude-code-startup-skills/transcribe-video --openclaw
Files (1)
SKILL.md
4.7 KB

Video Transcription Skill

Generate subtitles and transcripts from $ARGUMENTS (a video or audio file path, optionally followed by a language code like en-US or es-ES) using AWS Transcribe.

Outputs .srt, .vtt, and .txt files next to the source file.

Process

  1. Verify prerequisites - check ffmpeg and aws CLI are installed and configured
  2. Extract audio from the video as MP3 using ffmpeg
  3. Create temporary S3 bucket, upload audio
  4. Run AWS Transcribe job with SRT and VTT subtitle output
  5. Download results and generate plain text transcript
  6. Clean up all AWS resources - delete S3 bucket, Transcribe job, and temp files. No recurring costs.

Prerequisites

  • ffmpeg installed (brew install ffmpeg)
  • aws CLI installed and configured with valid credentials (brew install awscli && aws configure)
  • AWS credentials need permissions for: s3:* (create/delete buckets), transcribe:* (start/delete jobs)

Step-by-Step

Step 1: Extract audio

ffmpeg -i "input.mp4" -vn -acodec mp3 -q:a 2 "/tmp/transcribe-audio.mp3" -y

Step 2: Create temp S3 bucket and upload

BUCKET="tmp-transcribe-$(date +%s)"
aws s3 mb "s3://$BUCKET" --region us-east-1
aws s3 cp "/tmp/transcribe-audio.mp3" "s3://$BUCKET/audio.mp3"

Step 3: Start transcription job

JOB_NAME="tmp-job-$(date +%s)"
aws transcribe start-transcription-job \
  --transcription-job-name "$JOB_NAME" \
  --language-code en-US \
  --media-format mp3 \
  --media "MediaFileUri=s3://$BUCKET/audio.mp3" \
  --subtitles "Formats=srt,vtt" \
  --output-bucket-name "$BUCKET" \
  --region us-east-1

Language codes: en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP, zh-CN, it-IT, ko-KR, etc. Default to en-US if not specified.

Step 4: Poll until complete

while true; do
  STATUS=$(aws transcribe get-transcription-job \
    --transcription-job-name "$JOB_NAME" \
    --region us-east-1 \
    --query 'TranscriptionJob.TranscriptionJobStatus' \
    --output text)
  if [ "$STATUS" = "COMPLETED" ] || [ "$STATUS" = "FAILED" ]; then break; fi
  sleep 5
done

Step 5: Download subtitle files

Save .srt and .vtt next to the original file:

aws s3 cp "s3://$BUCKET/$JOB_NAME.srt" "/path/to/input.srt"
aws s3 cp "s3://$BUCKET/$JOB_NAME.vtt" "/path/to/input.vtt"

Step 6: Generate plain text transcript

Download the JSON result and extract the full transcript text:

aws s3 cp "s3://$BUCKET/$JOB_NAME.json" "/tmp/transcribe-result.json"

Then use a tool to extract the .results.transcripts[0].transcript field from the JSON and save it as a .txt file next to the original.

Step 7: Clean up everything

IMPORTANT: Always clean up to avoid recurring S3 storage costs.

# Delete S3 bucket and all contents
aws s3 rb "s3://$BUCKET" --force --region us-east-1

# Delete the transcription job
aws transcribe delete-transcription-job --transcription-job-name "$JOB_NAME" --region us-east-1

# Delete temp audio file
rm -f "/tmp/transcribe-audio.mp3" "/tmp/transcribe-result.json"

Real-World Results (Reference)

From actual transcription runs:

VideoDurationAudio SizeTranscribe TimeSubtitle Segments
X/Twitter clip2:402.5 MB~20 seconds83
Screen recording18:4511.4 MB~60 seconds500+

Key Insights

  1. AWS Transcribe is fast - even 19-minute videos complete in about a minute
  2. Short-form content (tweets, reels) transcribes almost instantly
  3. Cost is negligible - AWS Transcribe charges ~$0.024/min, so a 19-min video costs ~$0.46
  4. Cleanup is critical - always delete the S3 bucket to avoid storage charges
  5. SRT is most compatible - works with most video players and editors; VTT is better for web

Output Files

original-video.mp4
original-video.srt          # Subtitles with timestamps (most compatible)
original-video.vtt          # Web-optimized subtitles (for HTML5 <track>)
original-video.txt          # Plain text transcript (no timestamps)

After Transcription

  1. Verify all output files exist: ls -lh /path/to/original-video.{srt,vtt,txt}
  2. Report the number of subtitle segments and total duration
  3. Confirm all AWS resources have been cleaned up (no S3 buckets, no Transcribe jobs remaining)

Source

git clone https://github.com/rameerez/claude-code-startup-skills/blob/main/skills/transcribe-video/SKILL.mdView on GitHub

Overview

Generates captions and transcripts from video or audio files using AWS Transcribe. Outputs include SRT, VTT, and plain text transcripts next to the source file, helping accessibility and searchability across video content.

How This Skill Works

Prerequisites ffmpeg and AWS CLI must be installed and configured. The workflow extracts audio with ffmpeg, uploads it to a temporary S3 bucket, runs a Transcribe job with SRT and VTT outputs, then downloads the results and creates a TXT transcript before cleaning up resources.

When to Use It

  • Add captions to videos for accessibility (subtitles in SRT/VTT).
  • Create searchable transcripts for long recordings and lectures.
  • Extract spoken content from clips to generate notes or summaries.
  • Prepare multilingual captions using language codes like en-US or es-ES.
  • Make marketing or tutorial videos searchable and indexable on platforms.

Quick Start

  1. Step 1: Ensure ffmpeg and AWS CLI are installed and configured.
  2. Step 2: Extract audio from your video, upload to a temporary S3 bucket, and start a transcription job with SRT and VTT outputs.
  3. Step 3: Poll for completion, download the SRT, VTT, and TXT transcripts, and remove temporary resources.

Best Practices

  • Verify ffmpeg and AWS CLI are installed and credentials configured.
  • Specify the correct language-code to improve transcription accuracy.
  • Name and organize output files consistently next to the source video.
  • Monitor the transcription job status and handle large files in batches.
  • Clean up the S3 bucket and Transcribe job after downloads to avoid ongoing costs.

Example Use Cases

  • 2:40 Twitter clip: produces SRT, VTT, and TXT transcripts with 83 subtitle segments.
  • 18:45 screen recording: yields 500+ subtitle segments.
  • Lecture or tutorial video: accessible captions and a plain-text transcript created.
  • Marketing reel: captions added for social sharing and accessibility.
  • Product demo: searchable transcript for knowledge base integration.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers