Get the FREE Ultimate OpenClaw Setup Guide →

pdf-handling

Scanned
npx machina-cli add skill belumume/claude-skills/pdf-handling --openclaw
Files (1)
SKILL.md
314 B

PDF Extraction

Standard: python "$CLAUDE_PLUGIN_DIR/scripts/pdf_extract.py" "file.pdf" Unified: python "$CLAUDE_PLUGIN_DIR/scripts/pdf_extract_unified.py" "file.pdf"

Read the extracted .txt or _unified.md, not the PDF.

Source

git clone https://github.com/belumume/claude-skills/blob/main/plugins/pdf-guard/skills/pdf-handling/SKILL.mdView on GitHub

Overview

pdf-handling converts PDFs into readable text and images by running dedicated extraction scripts. It ensures you work with .txt or _unified.md outputs rather than raw PDFs, simplifying downstream processing. This standardizes input for NLP pipelines and documentation.

How This Skill Works

Use either Standard or Unified extraction scripts to convert a PDF into text and image assets. The Standard script is pdf_extract.py; the Unified script is pdf_extract_unified.py. After extraction, read the resulting .txt or _unified.md files, not the original PDF.

When to Use It

  • You need textual content from a PDF for analysis or summarization.
  • You want a consistent input format (.txt or _unified.md) for your pipeline.
  • You need to extract embedded images alongside text for context.
  • Preparing literature reviews or knowledge bases from PDFs.
  • Avoid processing raw PDFs directly in your agent workflow.

Quick Start

  1. Step 1: Run the appropriate extractor on your PDF (Standard or Unified).
  2. Step 2: Read the produced .txt or _unified.md file, not the PDF.
  3. Step 3: Use the extracted text/images for further processing.

Best Practices

  • Choose the correct script (Standard or Unified) for your workflow.
  • Verify extracted text encoding and review _unified.md for structure.
  • Always read the generated .txt or _unified.md instead of the PDF.
  • Keep a reproducible extraction step with the PDF source and output.
  • Check for any extraction errors and re-run if needed.

Example Use Cases

  • A researcher converts a batch of journal PDFs to .txt for NLP analysis.
  • A team processes monthly financial reports into _unified.md for dashboards.
  • A knowledge-base tool ingests product manuals via extracted text.
  • An academic archive stores PDFs alongside their text transcripts.
  • A content team summarizes e-books by extracting text for topic modeling.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers