Get the FREE Ultimate OpenClaw Setup Guide →

pdf-processing-openai

Scanned
npx machina-cli add skill lawvable/awesome-legal-skills/pdf-processing-openai --openclaw
Files (1)
SKILL.md
2.7 KB

PDF Skill

When to use

  • Read or review PDF content where layout and visuals matter.
  • Create PDFs programmatically with reliable formatting.
  • Validate final rendering before delivery.

Workflow

  1. Prefer visual review: render PDF pages to PNGs and inspect them.
    • Use pdftoppm if available.
    • If unavailable, install Poppler or ask the user to review the output locally.
  2. Use reportlab to generate PDFs when creating new documents.
  3. Use pdfplumber (or pypdf) for text extraction and quick checks; do not rely on it for layout fidelity.
  4. After each meaningful update, re-render pages and verify alignment, spacing, and legibility.

Temp and output conventions

  • Use tmp/pdfs/ for intermediate files; delete when done.
  • Write final artifacts under output/pdf/ when working in this repo.
  • Keep filenames stable and descriptive.

Dependencies (install if missing)

Prefer uv for dependency management.

Python packages:

uv pip install reportlab pdfplumber pypdf

If uv is unavailable:

python3 -m pip install reportlab pdfplumber pypdf

System tools (for rendering):

# macOS (Homebrew)
brew install poppler

# Ubuntu/Debian
sudo apt-get install -y poppler-utils

If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.

Environment

No required environment variables.

Rendering command

pdftoppm -png $INPUT_PDF $OUTPUT_PREFIX

Quality expectations

  • Maintain polished visual design: consistent typography, spacing, margins, and section hierarchy.
  • Avoid rendering issues: clipped text, overlapping elements, broken tables, black squares, or unreadable glyphs.
  • Charts, tables, and images must be sharp, aligned, and clearly labeled.
  • Use ASCII hyphens only. Avoid U+2011 (non-breaking hyphen) and other Unicode dashes.
  • Citations and references must be human-readable; never leave tool tokens or placeholder strings.

Final checks

  • Do not deliver until the latest PNG inspection shows zero visual or formatting defects.
  • Confirm headers/footers, page numbering, and section transitions look polished.
  • Keep intermediate files organized or remove them after final approval.

Source

git clone https://github.com/lawvable/awesome-legal-skills/blob/main/skills/pdf-processing-openai/SKILL.mdView on GitHub

Overview

PDF Skill provides a workflow-focused toolkit for reading or extracting content from PDFs, creating new PDFs with professional formatting, and validating visual fidelity before delivery. It emphasizes rendering checks, layout accuracy, and typography control for reports, contracts, and layouts.

How This Skill Works

Use pdftoppm to render PDF pages as PNGs for visual QA, employ reportlab to generate new PDFs with precise typography and formatting, and rely on pdfplumber or pypdf for text extraction and quick content checks (not for layout fidelity). After each meaningful update, re-render pages to ensure alignment, spacing, and legibility.

When to Use It

  • Reading or reviewing PDF content where layout and visuals matter
  • Programmatically creating PDFs with professional formatting
  • Validating final rendering before client delivery
  • Extracting text from PDFs for quick checks or data extraction (not for layout fidelity)
  • Rendering PDFs to PNGs for visual QA during the review process

Quick Start

  1. Step 1: Install dependencies and rendering tools (uv pip install reportlab pdfplumber pypdf; ensure Poppler is installed for pdftoppm)
  2. Step 2: Render and review PDFs by converting pages to PNGs using pdftoppm and visually inspecting for layout issues
  3. Step 3: Create or extract content with reportlab/pdfplumber/pypdf and re-render after meaningful updates

Best Practices

  • Render pages to PNGs with pdftoppm before QA or approval
  • Prefer reportlab for generating new PDFs to ensure consistent typography and margins
  • Use pdfplumber or pypdf for text extraction and quick checks; do not rely on them for layout fidelity
  • Keep intermediates in tmp/pdfs and final artifacts under output/pdf with descriptive filenames
  • Maintain polished design: ASCII hyphens only, consistent headers/footers, and clear page numbering during checks

Example Use Cases

  • Generate a contract packet PDF with header/footer, page numbers, and precise margins
  • Produce a quarterly report PDF with charts and tables rendered through a consistent typographic system
  • Create a product specification PDF with uniform typography across sections and titled subsections
  • Export standardized invoices programmatically to a well-formatted PDF template
  • Assemble a multi-section whitepaper with consistent typography, spacing, and clear section transitions

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers