How does pdf-processor handle encrypted PDFs?

If encrypted, inform the user that a password may be required to access content and offer to supply a password to retry.

What happens if the PDF is corrupted?

An appropriate error message is returned and processing stops to prevent failures downstream.

What outputs does it produce?

The output includes plain text content, CSV-formatted table data where tables exist, and any metadata in a structured result.

pdf-processor

npx machina-cli add skill intent-solutions-io/create-agent-skill-md/valid-skill --openclaw

Files (1)

SKILL.md

772 B

PDF Processor

Overview

Process PDF documents to extract text, tables, and metadata.

Instructions

Validate the PDF file exists and is readable
Extract text content using appropriate parser
Identify and extract tabular data
Return structured output

Examples

Input: "Extract the text from report.pdf" Output: Plain text content of the document

Input: "Get all tables from financial-report.pdf" Output: CSV-formatted table data

Error Handling

If the PDF is encrypted, inform the user that a password may be required. If the file is corrupted, return an appropriate error message.

Source

git clone https://github.com/intent-solutions-io/create-agent-skill-md/blob/main/examples/valid-skill/SKILL.mdView on GitHub

Overview

PDF Processor processes PDF documents to extract text, tables, and metadata. It validates that the file exists and is readable, uses an appropriate parser to pull content, and returns a structured output suited for document analysis.

How This Skill Works

First, validate the PDF exists and is readable. Then, a suitable text and table parser extracts the document text content and identifies tabular data, returning a structured result that includes text, tables, and metadata.

When to Use It

You need the plain text content of a report for quick review or indexing.
You require all tables from a financial or research PDF exported as CSV-formatted data.
You want to extract metadata (author, creation date) for asset management or cataloging.
You need to validate a PDF's readability and content before downstream processing in a workflow.
You process multiple PDFs in batch to ensure consistent extraction of text, tables, and metadata.

Quick Start

Step 1: Validate the PDF file exists and is readable.
Step 2: Extract text with an appropriate parser and identify tabular data.
Step 3: Return structured output containing text, tables, and metadata.

Best Practices

Validate the PDF file exists and is readable before processing.
Choose the appropriate parser for text and table extraction; ensure tabular data is captured accurately.
Return a structured output that separates text, tables (CSV), and metadata for downstream use.
Handle encrypted PDFs by reporting that a password may be required and retrying after access.
Handle corrupted PDFs by returning a clear error message and logging details for debugging.

Example Use Cases

Extract plain text from a report PDF for quick review.
Export all tables from a financial report to CSV for analysis.
Capture metadata such as author and creation date for asset management.
Process a batch of PDFs to ensure consistent extraction results.
Return structured output combining text, tables, and metadata for a knowledge base.

Frequently Asked Questions

Add this skill to your agents