What formats does document-converter handle?

Import supports PDF, DOCX, and PPTX to Markdown (with OCR fallback); Export supports Markdown to PDF/DOCX, with premium Quarto reports available.

Do I need to install dependencies?

Yes. Install system packages (poppler-utils, tesseract-ocr, pandoc, etc.) and Python packages (pypandoc, pdfminer.six, pdf2image, pytesseract, python-pptx, Pillow) as listed in the SKILL.md.

How do I apply branding or a cover page?

Use the standard compile_report.py path for basic covers or switch to the Quarto templates under assets for premium branding and cover pages.

document-converter

npx machina-cli add skill pablodiegoo/Data-Pro-Skill/document-converter --openclaw

Files (1)

SKILL.md

1.7 KB

Document Converter

Skill for importing external documents (PDF/DOCX/PPTX) to Markdown and exporting analysis results to professional reports (PDF/DOCX).

1. IMPORT: External Docs → Markdown

Uses markdowner.py with optional OCR fallback.

python3 .agent/skills/document-converter/scripts/markdowner.py input.pdf [--ocr]

2. EXPORT: Markdown → Final Report

Uses compile_report.py for standard reports or Quarto for premium reports.

# Standard PDF
python3 .agent/skills/document-converter/scripts/compile_report.py report.md --format pdf

Detailed Guides & Reference

Premium Quarto Reports: See quarto_reports.md
Troubleshooting & Setup: See troubleshooting.md

Assets

Quarto Templates: See assets/quarto-templates/ for base structure.

Dependencies

System Packages

sudo apt install poppler-utils tesseract-ocr pandoc texlive-xetex texlive-fonts-extra

Python Packages

pip install pypandoc pdfminer.six pdf2image pytesseract python-pptx Pillow

File Structure

.agent/skills/document-converter/
├── SKILL.md
├── assets/          # Templates and branding
├── references/      # Report manuals
│   ├── quarto_reports.md
│   └── troubleshooting.md
└── scripts/
    ├── markdowner.py      # Import engine
    └── compile_report.py  # Export engine

Source

git clone https://github.com/pablodiegoo/Data-Pro-Skill/blob/main/src/datapro/data/skills/document-converter/SKILL.mdView on GitHub

Overview

Converts PDFs, DOCX, and PPTX into Markdown, with OCR fallback for image-based pages. It can export Markdown as professional PDFs or DOCX reports, using standard templates or premium Quarto reports with cover pages and branding.

How This Skill Works

It uses markdowner.py to extract content into Markdown (with an optional --ocr flag for scanned documents). For exporting, it leverages compile_report.py for standard PDFs or a Quarto-based pipeline for premium reports, applying templates and branding during rendering.

When to Use It

Convert external PDFs, DOCX, or PPTX into clean Markdown for analysis or content reuse
Generate a professional PDF or DOCX report from a Markdown analysis result
Create branded reports with cover pages and themes via standard templates or premium Quarto reports
Process scanned documents that require OCR to extract text
Reuse Markdown content across formats for multiple projects

Quick Start

Step 1: Import a document: python3 .agent/skills/document-converter/scripts/markdowner.py input.pdf [--ocr]
Step 2: Edit the resulting Markdown as needed
Step 3: Export: python3 .agent/skills/document-converter/scripts/compile_report.py report.md --format pdf

Best Practices

Use OCR when dealing with image-based or scanned documents to improve text extraction
Start from a clean Markdown source and keep metadata consistent for easier exports
Leverage standard compile_report.py for basic PDFs and Quarto templates for premium branding
Keep branding assets up to date in assets/quarto-templates/ for consistent visuals
Verify dependencies (system packages and Python libraries) before running conversions

Example Use Cases

Convert a client brochure from PDF to Markdown for content extraction and republishing
Generate a formal project report PDF from a Markdown analysis result
Produce a premium branded Quarto report with a custom cover page from Markdown
Turn a PPTX slide deck into Markdown to repurpose content into a report
Create a branded report from Markdown using templates and a cover page

Frequently Asked Questions

Add this skill to your agents