pdf-processing-openai
Scannednpx machina-cli add skill lawvable/awesome-legal-skills/pdf-processing-openai --openclawPDF Skill
When to use
- Read or review PDF content where layout and visuals matter.
- Create PDFs programmatically with reliable formatting.
- Validate final rendering before delivery.
Workflow
- Prefer visual review: render PDF pages to PNGs and inspect them.
- Use
pdftoppmif available. - If unavailable, install Poppler or ask the user to review the output locally.
- Use
- Use
reportlabto generate PDFs when creating new documents. - Use
pdfplumber(orpypdf) for text extraction and quick checks; do not rely on it for layout fidelity. - After each meaningful update, re-render pages and verify alignment, spacing, and legibility.
Temp and output conventions
- Use
tmp/pdfs/for intermediate files; delete when done. - Write final artifacts under
output/pdf/when working in this repo. - Keep filenames stable and descriptive.
Dependencies (install if missing)
Prefer uv for dependency management.
Python packages:
uv pip install reportlab pdfplumber pypdf
If uv is unavailable:
python3 -m pip install reportlab pdfplumber pypdf
System tools (for rendering):
# macOS (Homebrew)
brew install poppler
# Ubuntu/Debian
sudo apt-get install -y poppler-utils
If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.
Environment
No required environment variables.
Rendering command
pdftoppm -png $INPUT_PDF $OUTPUT_PREFIX
Quality expectations
- Maintain polished visual design: consistent typography, spacing, margins, and section hierarchy.
- Avoid rendering issues: clipped text, overlapping elements, broken tables, black squares, or unreadable glyphs.
- Charts, tables, and images must be sharp, aligned, and clearly labeled.
- Use ASCII hyphens only. Avoid U+2011 (non-breaking hyphen) and other Unicode dashes.
- Citations and references must be human-readable; never leave tool tokens or placeholder strings.
Final checks
- Do not deliver until the latest PNG inspection shows zero visual or formatting defects.
- Confirm headers/footers, page numbering, and section transitions look polished.
- Keep intermediate files organized or remove them after final approval.
Source
git clone https://github.com/lawvable/awesome-legal-skills/blob/main/skills/pdf-processing-openai/SKILL.mdView on GitHub Overview
PDF Skill provides a workflow-focused toolkit for reading or extracting content from PDFs, creating new PDFs with professional formatting, and validating visual fidelity before delivery. It emphasizes rendering checks, layout accuracy, and typography control for reports, contracts, and layouts.
How This Skill Works
Use pdftoppm to render PDF pages as PNGs for visual QA, employ reportlab to generate new PDFs with precise typography and formatting, and rely on pdfplumber or pypdf for text extraction and quick content checks (not for layout fidelity). After each meaningful update, re-render pages to ensure alignment, spacing, and legibility.
When to Use It
- Reading or reviewing PDF content where layout and visuals matter
- Programmatically creating PDFs with professional formatting
- Validating final rendering before client delivery
- Extracting text from PDFs for quick checks or data extraction (not for layout fidelity)
- Rendering PDFs to PNGs for visual QA during the review process
Quick Start
- Step 1: Install dependencies and rendering tools (uv pip install reportlab pdfplumber pypdf; ensure Poppler is installed for pdftoppm)
- Step 2: Render and review PDFs by converting pages to PNGs using pdftoppm and visually inspecting for layout issues
- Step 3: Create or extract content with reportlab/pdfplumber/pypdf and re-render after meaningful updates
Best Practices
- Render pages to PNGs with pdftoppm before QA or approval
- Prefer reportlab for generating new PDFs to ensure consistent typography and margins
- Use pdfplumber or pypdf for text extraction and quick checks; do not rely on them for layout fidelity
- Keep intermediates in tmp/pdfs and final artifacts under output/pdf with descriptive filenames
- Maintain polished design: ASCII hyphens only, consistent headers/footers, and clear page numbering during checks
Example Use Cases
- Generate a contract packet PDF with header/footer, page numbers, and precise margins
- Produce a quarterly report PDF with charts and tables rendered through a consistent typographic system
- Create a product specification PDF with uniform typography across sections and titled subsections
- Export standardized invoices programmatically to a well-formatted PDF template
- Assemble a multi-section whitepaper with consistent typography, spacing, and clear section transitions