What is transcription in case.dev?

Transcription converts audio or video to text and can include speaker diarization to identify who spoke.

How do I enable speaker diarization and monitor progress?

Start the run with --speaker-labels and provide --speakers-expected; then use transcribe status or transcribe watch to monitor and fetch the transcript.

transcription

Scanned

npx machina-cli add skill CaseMark/legal-plugin/transcription --openclaw

Files (1)

SKILL.md

3.2 KB

case.dev Transcription

Audio and video transcription with speaker diarization, ideal for depositions, hearings, and recorded proceedings. Supports MP3, WAV, M4A, FLAC, OGG, WEBM, and MP4 up to 5GB.

Requires the casedev CLI. See setup skill for installation and auth.

Start a Transcription

The source file must be in a vault:

casedev transcribe run --vault VAULT_ID --object OBJECT_ID --json

The object must be an audio or video file. Non-media files are rejected.

Flags:

--speaker-labels — enable speaker diarization
--speakers-expected N — hint for number of speakers
--language — language code (e.g., en, es)
--format — output format
--auto-highlights — extract key phrases

Uses focused vault if set via casedev focus set --vault.

Check Status

casedev transcribe status TRANSCRIPTION_ID --json

Returns: ID, status, vault, source/result object IDs, audio duration, word count, confidence.

Statuses: queued -> processing -> completed or failed.

Get Result

casedev transcribe result TRANSCRIPTION_ID --json

Returns the transcript text. If the result is stored as a vault object, it fetches the text automatically. Fails if the job is not yet completed.

Watch Until Complete

casedev transcribe watch TRANSCRIPTION_ID --json

Polls until done, then prints the result. Flags:

--interval / -i — poll interval in seconds (default: 3)
--timeout / -t — max wait in seconds (default: 900)

Common Workflow

Transcribe a deposition recording

# 1. Upload to vault
casedev vault object upload ./deposition-2024-01-15.mp3 --vault VAULT_ID --json

# 2. Start transcription with speaker labels
casedev transcribe run --vault VAULT_ID --object OBJECT_ID --speaker-labels --speakers-expected 4 --json

# 3. Watch until complete
casedev transcribe watch TRANSCRIPTION_ID --json

# 4. Or check status + get result separately
casedev transcribe status TRANSCRIPTION_ID --json
casedev transcribe result TRANSCRIPTION_ID --json

Using the job tracker

# Transcription jobs are auto-tracked
casedev jobs list --type transcribe --json
casedev jobs watch JOB_ID --type transcribe --json

Troubleshooting

"Invalid file type for transcribe": Only audio/video MIME types are accepted. Check the object with casedev vault object list --vault VAULT_ID. Upload audio files, not PDFs.

"Invalid object ID for this vault": Run casedev vault object list --vault VAULT_ID to see valid IDs.

"Transcription is not complete yet": Wait for the job to finish. Use casedev transcribe watch or casedev jobs watch.

Transcription failed: Check casedev transcribe status TRANSCRIPTION_ID --json for the error message. Common causes: corrupted file, unsupported codec, file too large.

Long audio (2+ hours): Increase timeout with --timeout 3600. Large files take proportionally longer.

Source

git clone https://github.com/CaseMark/legal-plugin/blob/main/transcription/SKILL.mdView on GitHub

Overview

case.dev transcription converts audio and video to text with speaker diarization, ideal for depositions, hearings, and recorded proceedings. It supports MP3, WAV, M4A, FLAC, OGG, WEBM, and MP4 up to 5GB. Requires the casedev CLI to run.

How This Skill Works

Upload your media to a vault, then start a transcription with casedev transcribe run using --vault and --object. Enable speaker diarization with --speaker-labels and estimate the number of speakers with --speakers-expected; optionally set language, output format, and auto-highlights. Use status or watch to monitor progress and result to fetch the transcript.

When to Use It

When you need a verbatim transcript of a deposition or hearing
When speaker labels are required to identify who spoke
When your media is in vault-stored MP3, WAV, M4A, FLAC, OGG, WEBM, or MP4 up to 5GB
When you want an auditable workflow with status and result retrieval
When you need an automated extract of key phrases with auto-highlights

Quick Start

Step 1: Upload the media to a vault (example: casedev vault object upload ./video.mp4 --vault VAULT_ID --json)
Step 2: Start transcription with speaker labels (example: casedev transcribe run --vault VAULT_ID --object OBJECT_ID --speaker-labels --speakers-expected 4 --json)
Step 3: Watch until complete (example: casedev transcribe watch TRANSCRIPTION_ID --json)

Best Practices

Store media in a vault before transcribing for secure access
Enable --speaker-labels and provide --speakers-expected to improve diarization accuracy
Specify the language with --language to improve transcription precision
Use --auto-highlights to capture key phrases for quick review
Use transcribe status or transcribe watch to manage long transcriptions

Example Use Cases

Deposition transcription with four speakers using speaker labels
Hearing recording in English with two speakers
MP4 video transcript with diarization for a court proceeding
Large file transcription (up to 5GB) with timeout considerations
Retrieve transcript via transcribe result after completion

Frequently Asked Questions

Add this skill to your agents