Get the FREE Ultimate OpenClaw Setup Guide →

pdf-processor

npx machina-cli add skill intent-solutions-io/create-agent-skill-md/valid-skill --openclaw
Files (1)
SKILL.md
772 B

PDF Processor

Overview

Process PDF documents to extract text, tables, and metadata.

Instructions

  1. Validate the PDF file exists and is readable
  2. Extract text content using appropriate parser
  3. Identify and extract tabular data
  4. Return structured output

Examples

Input: "Extract the text from report.pdf" Output: Plain text content of the document

Input: "Get all tables from financial-report.pdf" Output: CSV-formatted table data

Error Handling

If the PDF is encrypted, inform the user that a password may be required. If the file is corrupted, return an appropriate error message.

Source

git clone https://github.com/intent-solutions-io/create-agent-skill-md/blob/main/examples/valid-skill/SKILL.mdView on GitHub

Overview

PDF Processor processes PDF documents to extract text, tables, and metadata. It validates that the file exists and is readable, uses an appropriate parser to pull content, and returns a structured output suited for document analysis.

How This Skill Works

First, validate the PDF exists and is readable. Then, a suitable text and table parser extracts the document text content and identifies tabular data, returning a structured result that includes text, tables, and metadata.

When to Use It

  • You need the plain text content of a report for quick review or indexing.
  • You require all tables from a financial or research PDF exported as CSV-formatted data.
  • You want to extract metadata (author, creation date) for asset management or cataloging.
  • You need to validate a PDF's readability and content before downstream processing in a workflow.
  • You process multiple PDFs in batch to ensure consistent extraction of text, tables, and metadata.

Quick Start

  1. Step 1: Validate the PDF file exists and is readable.
  2. Step 2: Extract text with an appropriate parser and identify tabular data.
  3. Step 3: Return structured output containing text, tables, and metadata.

Best Practices

  • Validate the PDF file exists and is readable before processing.
  • Choose the appropriate parser for text and table extraction; ensure tabular data is captured accurately.
  • Return a structured output that separates text, tables (CSV), and metadata for downstream use.
  • Handle encrypted PDFs by reporting that a password may be required and retrying after access.
  • Handle corrupted PDFs by returning a clear error message and logging details for debugging.

Example Use Cases

  • Extract plain text from a report PDF for quick review.
  • Export all tables from a financial report to CSV for analysis.
  • Capture metadata such as author and creation date for asset management.
  • Process a batch of PDFs to ensure consistent extraction results.
  • Return structured output combining text, tables, and metadata for a knowledge base.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers