Get the FREE Ultimate OpenClaw Setup Guide →

pdf-to-markdown-converter

Scanned
npx machina-cli add skill maaarcooo/claude-skills/pdf-to-markdown-converter --openclaw
Files (1)
SKILL.md
2.1 KB

PDF to Markdown Converter

Faithfully convert PDF content to markdown format using visual PDF capability or image OCR.

Process

  1. Read every page of the PDF using visual PDF understanding or image OCR
  2. Transcribe all content faithfully — this is conversion, not summarisation
  3. Preserve document structure: headings, lists, tables, formatting
  4. Output as markdown file matching the PDF filename

Conversion Rules

Complete transcription: Include ALL text content. Do not summarise or omit.

Preserve structure:

  • Headings → #, ##, ### (match hierarchy from source)
  • Bold text → **bold**
  • Italic text → *italic*
  • Bullet lists → - or *
  • Numbered lists → 1., 2., etc.

Tables: Convert to markdown table format:

| Column 1 | Column 2 |
|----------|----------|
| Data     | Data     |

Equations/formulas: Use LaTeX notation:

  • Inline: $E = mc^2$
  • Block: $$F = ma$$

Diagrams/images: Describe in a blockquote with [DIAGRAM] prefix:

> [DIAGRAM]: Description of what the diagram shows, including labels and key information.

Remove repeated footers: Omit recurring footer content such as brand names, website links, copyright notices, page numbers, and other boilerplate that appears on multiple pages.

Page breaks: Optionally insert --- between major sections if helpful for navigation.

Output

Markdown file named to match the source PDF (e.g., Topic Name.md).

Quality Checklist

  • Every page processed
  • All text content included (no omissions)
  • Headings hierarchy preserved
  • Tables correctly formatted
  • Equations in LaTeX notation
  • Diagrams described with key details
  • Document structure matches original

Source

git clone https://github.com/maaarcooo/claude-skills/blob/main/archive/pdf-to-markdown-converter/SKILL.mdView on GitHub

Overview

pdf-to-markdown-converter faithfully translates PDF content into Markdown, preserving headings, lists, tables, and equations. It uses visual PDF understanding or image OCR to extract all content and outputs a Markdown file named after the source PDF.

How This Skill Works

It reads every page of the PDF using visual PDF understanding or OCR, then transcribes content faithfully without summarisation. It preserves document structure (headings, lists, tables, formatting) and converts formatting to Markdown, including LaTeX for equations, finally writing the result to a .md file named after the PDF.

When to Use It

  • When you need to convert a PDF to Markdown for web publishing or documentation.
  • When you want to extract PDF content as Markdown for data pipelines or content reuse.
  • When you need to transcribe a PDF, including structure, tables, and equations, without summary.
  • When preparing PDF content for further processing by NLP or search indexing.
  • When dealing with image-based or scanned PDFs and you require OCR-based Markdown output.

Quick Start

  1. Step 1: Provide the PDF file you want converted (or its path) to the converter.
  2. Step 2: Run the pdf-to-markdown-converter and specify the output format as Markdown.
  3. Step 3: Retrieve the Markdown file named after the original PDF (e.g., Document.md).

Best Practices

  • Process every page to ensure nothing is omitted.
  • Verify the transcription includes all text, tables, and equations (no summarisation).
  • Preserve the original document hierarchy by mapping headings to the correct Markdown levels.
  • Use proper Markdown for tables and ensure LaTeX notation is retained for equations.
  • Remove recurring footers and boilerplate; name the output file after the source PDF.

Example Use Cases

  • Converting a research paper PDF into Markdown for a GitHub repository.
  • Transcribing a product specification PDF into Markdown for a living spec catalog.
  • Migrating a scanned user manual into Markdown for an internal knowledge base.
  • Extracting tables and equations from an academic thesis PDF to publish as Markdown.
  • Preparing a legal document PDF for automated processing in a workflow.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers