What file types are supported?

PDFs and images up to 500MB. Supported image formats include PNG, JPG, TIFF, BMP, WEBP. Ensure content type is application/pdf or image/*.

How do I access word-level positional data?

After OCR ingestion, use casedev ocr words --vault VAULT_ID --object OBJECT_ID --json with optional --page/--word-start/--word-end to retrieve per-page word arrays containing text, index, and confidence.

What CLI do I need and how do I install it?

You need the case.dev CLI (casedev). Install via the setup skill and authenticate; OCR requires the CLI to submit jobs and fetch results.

ocr

Scanned

npx machina-cli add skill CaseMark/legal-plugin/ocr --openclaw

Files (1)

SKILL.md

3.0 KB

case.dev OCR

Production-grade document OCR with table extraction and word-level positional data. Processes PDFs and images (PNG, JPG, TIFF, BMP, WEBP) up to 500MB.

Requires the casedev CLI. See setup skill for installation and auth.

Process a Document

Submit a document URL for OCR:

casedev ocr process --document-url "https://example.com/contract.pdf" --json

Flags:

--document-url / --url (required) — publicly accessible URL or presigned vault URL
--document-id — optional identifier to tag the job
--engine — OCR engine override

Returns a job ID and initial status.

Check Job Status

casedev ocr status JOB_ID --json

Returns: ID, status, page count, created/completed timestamps.

Statuses: queued -> processing -> completed or failed.

Watch Until Complete

casedev ocr watch JOB_ID --json

Polls until the job finishes. Flags:

--interval / -i — poll interval in seconds (default: 3)
--timeout / -t — max wait in seconds (default: 900)

Word-Level Data

Retrieve word-level OCR output for a vault object:

casedev ocr words --vault VAULT_ID --object OBJECT_ID --json

This requires the document to be in a vault and have completed OCR ingestion. The object must be a PDF or image (audio/video files are rejected).

Flags:

--page — specific page number
--word-start — starting word index
--word-end — ending word index

Returns per-page word arrays with text, word index, and confidence scores.

Uses focused vault if set via casedev focus set --vault.

Common Workflow

OCR a document from a vault

# 1. Upload to vault (triggers automatic ingestion + OCR)
casedev vault object upload ./scanned-contract.pdf --vault VAULT_ID --json

# 2. Check ingestion status
casedev vault object list --vault VAULT_ID --json

# 3. Get word-level data
casedev ocr words --vault VAULT_ID --object OBJECT_ID --json

# 4. Get specific page range
casedev ocr words --vault VAULT_ID --object OBJECT_ID --page 3 --json

OCR an external document

# 1. Submit
casedev ocr process --document-url "https://storage.example.com/doc.pdf" --json

# 2. Watch
casedev ocr watch JOB_ID --json

Troubleshooting

"Invalid file type for OCR": OCR only supports PDFs and images (application/pdf, image/*). Check the object's content type with casedev vault object list.

"Invalid object ID for this vault": Run casedev vault object list --vault VAULT_ID to see valid object IDs.

Job stuck in "processing": Increase watch timeout with --timeout 1800. Large documents (100+ pages) take longer.

"OCR job failed": The document may be corrupted or in an unsupported format. Re-upload and retry.

Source

git clone https://github.com/CaseMark/legal-plugin/blob/main/ocr/SKILL.mdView on GitHub

Overview

case.dev OCR delivers production-grade text extraction and table parsing with word-level positional data. It processes PDFs and images up to 500MB and returns per-page text, tables, and per-word metadata to enable precise search and data capture.

How This Skill Works

Submit a document URL or vault upload via the casedev CLI, trigger OCR, and monitor the job until completion. Once ingested, retrieve per-page word data including text, word index, and confidence, or access page-level results for structured data extraction.

When to Use It

Digitize and index paper documents to make them searchable
Extract text and tables from contracts or PDFs for data extraction
Obtain word-level positional data for precise highlighting, redaction, or analytics
Process documents stored in a vault or via public URLs for OCR ingestion
Handle large documents up to 500MB across multiple pages

Quick Start

Step 1: Submit the document via casedev ocr process --document-url "https://example.com/doc.pdf" --json or upload to a vault object to trigger ingestion
Step 2: Watch or check the status with casedev ocr watch JOB_ID --json
Step 3: Retrieve word-level data with casedev ocr words --vault VAULT_ID --object OBJECT_ID --json

Best Practices

Use a publicly accessible URL or presigned vault URL with --document-url / --url when submitting
Prefer retrieving word-level data with casedev ocr words for detailed analysis
Verify input types are PDF or image/* before processing to avoid errors
Check status frequently (status or watch) for long-running jobs and plan timeouts accordingly
When dealing with large or multi-page documents, fetch data in page ranges to manage results efficiently

Example Use Cases

OCR a scanned contract from a vault to extract terms and table data
Digitize an external PDF invoice and capture line items with positions
Archive multi-page statements with per-page word arrays for search
Extract redaction-ready text from regulatory documents with word-level data
Process external documents and compare text across versions for version control

Frequently Asked Questions

Add this skill to your agents