Which output formats can doc-processor generate?

It can generate Markdown, PDF, DOCX, CSV, and HTML by converting a drafted Markdown using pandoc or other supported tools invoked through bash.

What if pandoc or other tools are missing?

Inform the user to install the required tools and provide the drafted Markdown as a fallback if possible.

Where is the generated file saved and how is the path shared?

The file is saved to a result path (often /tmp or a user-specified location); you’ll be notified of the exact path after generation.

doc-processor

npx machina-cli add skill next-open-ai/openclawx/doc-processor --openclaw

Files (1)

SKILL.md

1.3 KB

Document Processor Skill

Use this skill when the user asks you to read specific technical documents, summarize reports, or generate structured files (like a structured markdown report, a CSV of data, or an HTML presentation).

Workflow

Reading Documents:
- If the file is plaintext (txt, md, csv, json), use the read tool directly.
- If it's a binary document (pdf, docx), check if tools like pdftotext or pandoc are installed via the bash tool, then convert it to text in a temporary directory (/tmp/) before reading it.
Generating Documents:
- Understand the required structure and content from the user.
- Draft the content in a plaintext format (e.g., Markdown) using the write tool.
- If the user requested a specific format like PDF or HTML, use bash to run pandoc output.md -o output.pdf or similar commands.
If necessary tools (like pandoc) are missing, politely inform the user to install them or provide the drafted Markdown as a fallback.
Notify the user with the path to the newly generated document.

Source

git clone https://github.com/next-open-ai/openclawx/blob/main/presets/workspaces/doc-assistant/skills/doc-processor/SKILL.md

View on GitHub

Overview

doc-processor reads, parses, and generates documents in formats such as Markdown, PDF, DOCX, CSV, and HTML. It uses bash commands to invoke conversion tools like pandoc or Python scripts to convert and format content, enabling seamless document workflows from reading to output generation.

How This Skill Works

Reading flow: plaintext files (txt, md, csv, json) are read directly via the read tool; binary formats (pdf, docx) are converted to text in /tmp using pdftotext or pandoc before processing. Generating flow: draft content in Markdown with the write tool, then, if a specific target format is requested (PDF, HTML), invoke pandoc through bash to produce the final file and report its path.

When to Use It

You need to extract and summarize text from a PDF or DOCX document.
You want to generate a structured Markdown report from data and notes.
You require a CSV export of tabular data derived from a document or dataset.
You need to convert a Markdown draft to PDF or HTML for distribution.
You want to convert or reformat an existing document into another supported format.

Quick Start

Step 1: Provide the source file and your target format (e.g., PDF, HTML, or CSV).
Step 2: Use read to fetch content (or convert binary to text in /tmp) and draft the report with write in Markdown.
Step 3: Run pandoc via bash to generate the final file and receive the path to the output (e.g., /tmp/output.pdf).

Best Practices

Clearly confirm the input file type and the desired output format before starting.
Draft content in Markdown first, then convert to the target format as needed.
Check that required tools (pandoc, pdftotext) are installed; if missing, inform the user and provide a Markdown fallback.
Use a temporary directory like /tmp for intermediate conversions and clean up afterward.
Validate the final document and provide the user with the exact path to the generated file.

Example Use Cases

Convert a PDF user guide to HTML for a web wiki using pandoc via bash.
Generate a CSV summary from a Markdown table extracted from a report.
Produce a PDF report from a Markdown draft for stakeholder distribution.
Extract text from a DOCX resume for indexing and search optimization.
Create a Markdown version of a JSON specification for developer documentation.

Frequently Asked Questions

Add this skill to your agents