How are figures detected and added?

Uses detect-figures.ts to auto-detect figures/tables within the PDF and extract via extract-figure.ts.

Can I export to PowerPoint or PDF?

Yes, slides can be merged into PPTX with merge-to-pptx.ts or into PDF with merge-to-pdf.ts.

What inputs can I feed this skill?

Input can be a PDF paper or a Markdown/content file; you can also paste content directly if you omit a path.

paper-slide-deck

Scanned

npx machina-cli add skill Ronitnair/research-skills/paper-slide-deck --openclaw

Files (1)

SKILL.md

18.2 KB

Paper Slide Deck Generator

Transform academic papers and content into professional slide deck images with automatic figure extraction.

Usage

/paper-slide-deck path/to/paper.pdf
/paper-slide-deck path/to/paper.pdf --style academic-paper
/paper-slide-deck path/to/content.md --style sketch-notes
/paper-slide-deck path/to/content.md --audience executives
/paper-slide-deck path/to/content.md --lang zh
/paper-slide-deck path/to/content.md --slides 10
/paper-slide-deck path/to/content.md --outline-only
/paper-slide-deck  # Then paste content

Script Directory

Important: All scripts are located in the scripts/ subdirectory of this skill.

Agent Execution Instructions:

Determine this SKILL.md file's directory path as SKILL_DIR
Script path = ${SKILL_DIR}/scripts/<script-name>.ts
Replace all ${SKILL_DIR} in this document with the actual path

Script Reference:

Script	Purpose
`scripts/generate-slides.py`	Generate AI slides via Gemini API (Python)
`scripts/merge-to-pptx.ts`	Merge slides into PowerPoint
`scripts/merge-to-pdf.ts`	Merge slides into PDF
`scripts/detect-figures.ts`	Auto-detect figures/tables in PDF
`scripts/extract-figure.ts`	Extract figure from PDF page (uses PyMuPDF fallback)
`scripts/apply-template.ts`	Apply figure container template

Options

Option	Description
`--style <name>`	Visual style (see Style Gallery)
`--audience <type>`	Target audience: beginners, intermediate, experts, executives, general
`--lang <code>`	Output language (en, zh, ja, etc.)
`--slides <number>`	Target slide count
`--outline-only`	Generate outline only, skip image generation

Style Gallery

Style	Description	Best For
`academic-paper`	Clean professional, precise charts	Conference talks, thesis defense
`blueprint` (Default)	Technical schematics, grid texture	Architecture, system design
`chalkboard`	Black chalkboard, colorful chalk	Education, tutorials, classroom
`notion`	SaaS dashboard, card-based layouts	Product demos, SaaS, B2B
`bold-editorial`	Magazine cover, bold typography, dark	Product launches, keynotes
`corporate`	Navy/gold, structured layouts	Investor decks, proposals
`dark-atmospheric`	Cinematic dark mode, glowing accents	Entertainment, gaming
`editorial-infographic`	Magazine explainers, flat illustrations	Tech explainers, research
`fantasy-animation`	Ghibli/Disney style, hand-drawn	Educational, storytelling
`intuition-machine`	Technical briefing, bilingual labels	Technical docs, academic
`minimal`	Ultra-clean, maximum whitespace	Executive briefings, premium
`pixel-art`	Retro 8-bit, chunky pixels	Gaming, developer talks
`scientific`	Academic diagrams, precise labeling	Biology, chemistry, medical
`sketch-notes`	Hand-drawn, warm & friendly	Educational, tutorials
`vector-illustration`	Flat vector, retro & cute	Creative, children's content
`vintage`	Aged-paper, historical styling	Historical, heritage, biography
`watercolor`	Hand-painted textures, natural warmth	Lifestyle, wellness, travel

Auto Style Selection

Content Signals	Selected Style
paper, thesis, defense, conference, ieee, acm, icml, neurips, cvpr, acl, aaai, iclr	`academic-paper`
tutorial, learn, education, guide, intro, beginner	`sketch-notes`
classroom, teaching, school, chalkboard, blackboard	`chalkboard`
architecture, system, data, analysis, technical	`blueprint`
creative, children, kids, cute, illustration	`vector-illustration`
briefing, bilingual, infographic, concept	`intuition-machine`
executive, minimal, clean, simple, elegant	`minimal`
saas, product, dashboard, metrics, productivity	`notion`
investor, quarterly, business, corporate, proposal	`corporate`
launch, marketing, keynote, bold, impact, magazine	`bold-editorial`
entertainment, music, gaming, creative, atmospheric	`dark-atmospheric`
explainer, journalism, science communication	`editorial-infographic`
story, fantasy, animation, magical, whimsical	`fantasy-animation`
gaming, retro, pixel, developer, nostalgia	`pixel-art`
biology, chemistry, medical, pathway, scientific	`scientific`
history, heritage, vintage, expedition, historical	`vintage`
lifestyle, wellness, travel, artistic, natural	`watercolor`
Default	`blueprint`

Layout Gallery

Optional layout hints for individual slides. Specify in outline's // LAYOUT section.

Slide-Specific Layouts

Layout	Description	Best For
`title-hero`	Large centered title + subtitle	Cover slides, section breaks
`quote-callout`	Featured quote with attribution	Testimonials, key insights
`key-stat`	Single large number as focal point	Impact statistics, metrics
`split-screen`	Half image, half text	Feature highlights, comparisons
`icon-grid`	Grid of icons with labels	Features, capabilities, benefits
`two-columns`	Content in balanced columns	Paired information, dual points
`three-columns`	Content in three columns	Triple comparisons, categories
`image-caption`	Full-bleed image + text overlay	Visual storytelling, emotional
`agenda`	Numbered list with highlights	Session overview, roadmap
`bullet-list`	Structured bullet points	Simple content, lists

Infographic-Derived Layouts

Layout	Description	Best For
`linear-progression`	Sequential flow left-to-right	Timelines, step-by-step
`binary-comparison`	Side-by-side A vs B	Before/after, pros-cons
`comparison-matrix`	Multi-factor grid	Feature comparisons
`hierarchical-layers`	Pyramid or stacked levels	Priority, importance
`hub-spoke`	Central node with radiating items	Concept maps, ecosystems
`bento-grid`	Varied-size tiles	Overview, summary
`funnel`	Narrowing stages	Conversion, filtering
`dashboard`	Metrics with charts/numbers	KPIs, data display
`venn-diagram`	Overlapping circles	Relationships, intersections
`circular-flow`	Continuous cycle	Recurring processes
`winding-roadmap`	Curved path with milestones	Journey, timeline
`tree-branching`	Parent-child hierarchy	Org charts, taxonomies
`iceberg`	Visible vs hidden layers	Surface vs depth
`bridge`	Gap with connection	Problem-solution

Academic-Specific Layouts

Layout	Description	Best For
`paper-title`	Title, authors, affiliations, venue	Conference paper cover
`outline-agenda`	Numbered section list with highlights	Talk structure overview
`methods-diagram`	Central architecture/pipeline diagram	Methods, system design
`results-chart`	Chart area + data annotations	Quantitative results
`equation-focus`	Centered equation + variable definitions	Mathematical derivations
`qualitative-grid`	2x2 or 3x2 image comparison grid	Visual results, ablations
`references-list`	Numbered citation list	Key references slide
`contributions`	Numbered contribution points	Contributions summary

Usage: Add Layout: <name> in slide's // LAYOUT section to guide visual composition.

Design Philosophy

This deck is designed for reading and sharing, not live presentation:

Each slide must be self-explanatory without verbal commentary
Structure content for logical flow when scrolling
Include all necessary context within each slide
Optimize for social media sharing and offline reading

File Management

Output Directory

Each session creates an independent directory named by content slug:

slide-deck/{topic-slug}/
├── source-{slug}.{ext}    # Source files (text, images, etc.)
├── outline.md
├── outline-{style}.md     # Style variant outlines
├── prompts/
│   └── 01-slide-cover.md, 02-slide-{slug}.md, ...
├── 01-slide-cover.png, 02-slide-{slug}.png, ...
├── {topic-slug}.pptx
└── {topic-slug}.pdf

Slug Generation:

Extract main topic from content (2-4 words, kebab-case)
Example: "Introduction to Machine Learning" → intro-machine-learning

Conflict Resolution

If slide-deck/{topic-slug}/ already exists:

Append timestamp: {topic-slug}-YYYYMMDD-HHMMSS
Example: intro-ml exists → intro-ml-20260118-143052

Source Files

Copy all sources with naming source-{slug}.{ext}:

source-article.md (main text content)
source-diagram.png (image from conversation)
source-data.xlsx (additional file)

Multiple sources supported: text, images, files from conversation.

Workflow

Step 1: Analyze Content

Save source content (if pasted, save as source.md)
Follow references/analysis-framework.md for deep content analysis
Determine style (use --style or auto-select from signals)
Detect languages (source vs. user preference)
Plan slide count (--slides or dynamic)
For academic papers (PDF with figures): Run automatic figure detection:
```
npx -y bun ${SKILL_DIR}/scripts/detect-figures.ts --pdf source-paper.pdf --output figures.json
```
This outputs a JSON file with all detected figures/tables, their page numbers, and captions.

Step 2: Generate Outline Variants

Generate 3 style variant outlines based on content analysis
Follow references/outline-template.md for structure
Auto-populate IMAGE_SOURCE for academic papers:
- Read figures.json from Step 1
- Map figures to slides using rules in references/analysis-framework.md Section 8
- Automatically add // IMAGE_SOURCE blocks to appropriate slides:
  - Architecture/pipeline figures → Methods slides (Source: extract)
  - Results tables → Quantitative results slides (Source: extract)
  - Comparison images → Qualitative results slides (Source: extract)
  - Conceptual/simple diagrams → Leave for AI generation (Source: generate or omit)
Save as outline-{style}.md for each variant

Step 3: User Confirmation

Single AskUserQuestion with all applicable options:

Question	When to Ask
Style variant	Always (3 options + custom)
Language	Only if source ≠ user language

After selection:

Copy selected outline-{style}.md to outline.md
Regenerate in different language if requested
User may edit outline.md for fine-tuning

If --outline-only, stop here.

Step 4: Generate Prompts

Read references/base-prompt.md
Combine with style instructions from outline
Add slide-specific content
If Layout: specified in outline, include layout guidance in prompt:
- Reference layout characteristics for image composition
- Example: Layout: hub-spoke → "Central concept in middle with related items radiating outward"
Save to prompts/ directory

Step 5: Image Generation Method Selection

Before generating images, ask user to choose generation method:

Use AskUserQuestion with options:

Option	Label	Description
1	Gemini API (Recommended)	Official Google API via Python. Requires GOOGLE_API_KEY env var.
2	Gemini Web (Browser-based)	⚠️ Uses reverse-engineered web API. No API key needed but may break.

Based on selection:

Option 1: Gemini API (Python)

Verify API key: Check GOOGLE_API_KEY or GEMINI_API_KEY environment variable

Run generation script:

python ${SKILL_DIR}/scripts/generate-slides.py <slide-deck-dir> --model gemini-3-pro-image-preview

Script Features:

Auto-installs google-genai package if missing
Retry logic with exponential backoff (3 retries)
Skips already-generated slides (> 10KB)
Supports custom model via --model flag
Outputs to slides/ subdirectory

Troubleshooting:

If server disconnection errors occur, script auto-retries
For persistent failures, re-run the script (it skips completed slides)
Check API quota if many failures occur

Option 2: Gemini Web Skill

Consent Check: Read consent file at:
- Windows: $APPDATA/baoyu-skills/gemini-web/consent.json
- macOS: ~/Library/Application Support/baoyu-skills/gemini-web/consent.json
- Linux: ~/.local/share/baoyu-skills/gemini-web/consent.json

If no consent or version mismatch, display disclaimer and ask:

⚠️ DISCLAIMER: This uses a reverse-engineered Gemini Web API (NOT official).
Risks: May break anytime, no support, possible account risk.

For each slide, run:

npx -y bun ${GEMINI_WEB_SKILL_DIR}/scripts/main.ts \
  --promptfiles prompts/01-slide-cover.md \
  --image 01-slide-cover.png \
  --sessionId slides-{topic-slug}-{timestamp}

Where GEMINI_WEB_SKILL_DIR = path to baoyu-danger-gemini-web skill directory.

Proxy support: If user is in restricted network, prepend:

HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890

Step 5.5: Process IMAGE_SOURCE (Automatic Figure Extraction)

For academic presentations, IMAGE_SOURCE metadata was auto-populated in Step 2 based on figure detection from Step 1.

Automatic Execution:

Parse outline to identify slides with Source: extract
Create figures directory: mkdir -p figures

For each extract slide, automatically:

Read the Figure number, Page, and Caption from metadata

Run figure extraction script:

npx -y bun ${SKILL_DIR}/scripts/extract-figure.ts \
  --pdf source-paper.pdf \
  --page <page-number> \
  --output figures/figure-<N>.png

Run template application script:

npx -y bun ${SKILL_DIR}/scripts/apply-template.ts \
  --figure figures/figure-<N>.png \
  --title "<slide-headline>" \
  --caption "Figure <N>: <caption-text>" \
  --output <NN>-slide-<slug>.png

Report: "Extracted: Figure N → slide NN"

For slides with Source: generate (or no IMAGE_SOURCE):
- Proceed to Step 6 for AI generation

Note: Source PDF must be saved as source-paper.pdf in output directory.

Troubleshooting:

If figure detection missed a figure: manually add // IMAGE_SOURCE block to outline
If wrong figure mapped: edit the Figure: and Page: values in outline
If extraction fails: check PDF page number (1-indexed)

PyMuPDF Fallback for Page Extraction: If extract-figure.ts fails with "Image or Canvas expected" error (common with complex PDFs), use PyMuPDF:

import fitz
doc = fitz.open("source-paper.pdf")
page = doc[page_num - 1]  # 0-indexed
mat = fitz.Matrix(3, 3)  # 3x scale for high resolution
pix = page.get_pixmap(matrix=mat)
pix.save(f"extracted/page-{page_num}.png")

Then apply template using apply-template.ts.

Step 6: Generate Images

Use selected method from Step 5
Skip slides already processed in Step 5.5 (those with Source: extract)
Generate session ID: slides-{topic-slug}-{timestamp}
Generate each remaining slide with same session ID
Report progress: "Generated X/N"
Auto-retry once on generation failure

Step 7: Merge to PPTX and PDF

npx -y bun ${SKILL_DIR}/scripts/merge-to-pptx.ts <slide-deck-dir>
npx -y bun ${SKILL_DIR}/scripts/merge-to-pdf.ts <slide-deck-dir>

Step 8: Output Summary

Slide Deck Complete!

Topic: [topic]
Style: [style name]
Location: [directory path]
Slides: N total

- 01-slide-cover.png ✓ Cover
- 02-slide-intro.png ✓ Content
- ...
- {NN}-slide-back-cover.png ✓ Back Cover

Outline: outline.md
PPTX: {topic-slug}.pptx
PDF: {topic-slug}.pdf

Slide Modification

See references/modification-guide.md for:

Edit single slide workflow
Add new slide (with renumbering)
Delete slide (with renumbering)
File naming conventions

Image Generation Dependencies

Gemini API (Option 1 - Recommended)

Requires:

GOOGLE_API_KEY or GEMINI_API_KEY environment variable
Python 3.8+ with pip
google-genai package (auto-installed by script)

Model: gemini-3-pro-image-preview (default)

Gemini Web Skill (Option 2)

Requires:

baoyu-danger-gemini-web skill installed at .claude/skills/baoyu-danger-gemini-web
Google Chrome browser with logged-in Google account
User consent for reverse-engineered API disclaimer

PDF Figure Extraction

Requires:

Primary: pdfjs-dist npm package (use legacy build for Node.js)
Fallback: pymupdf Python package (more reliable for complex PDFs)
canvas npm package for apply-template.ts

References

File	Content
`references/analysis-framework.md`	Deep content analysis for presentations
`references/outline-template.md`	Outline structure and STYLE_INSTRUCTIONS format
`references/modification-guide.md`	Edit, add, delete slide workflows
`references/content-rules.md`	Content and style guidelines
`references/base-prompt.md`	Base prompt for image generation
`references/figure-container-template.md`	Visual specs for extracted figure containers
`references/styles/<style>.md`	Full style specifications

Notes

Image Generation

Nano Banana Pro API: Recommended. Stable, reliable, requires API key
Gemini Web: No API key needed, but uses reverse-engineered API with account risk
Generation time: 10-30 seconds per slide
Auto-retry once on generation failure
Maintain style consistency via session ID

Content Guidelines

Use stylized alternatives for sensitive public figures
Both methods use the same underlying Gemini model for image generation

Extension Support

Custom styles and configurations via EXTEND.md.

Check paths (priority order):

.paper-skills/paper-slide-deck/EXTEND.md (project)
~/.paper-skills/paper-slide-deck/EXTEND.md (user)

If found, load before Step 1. Extension content overrides defaults.

Source

git clone https://github.com/Ronitnair/research-skills/blob/main/paper-slide-deck/SKILL.mdView on GitHub

Overview

Transforms academic papers and content into professional slide decks as images. It automatically detects figures from PDFs, creates detailed outlines with style instructions, and renders each slide image. Ideal for turning papers into presentations, conference talks, or content-to-slides workflows.

How This Skill Works

Feed paper.pdf or content.md; the tool auto-detects figures with detect-figures.ts and extracts them as needed. It then generates slide images via an AI pipeline (Gemini API) using the selected style, audience, language, and slide count, and can merge results into PPTX or PDF. An outline-only option lets you craft structure first.

When to Use It

User asks to create slides or a deck from a research paper.
Prepare a conference or thesis-defense presentation from a paper.
Convert content.md notes into a styled slide deck (e.g., for executives).
Produce an outline-first deck to validate structure before full slide generation.
Generate decks in different languages using --lang (e.g., en, zh).

Quick Start

Step 1: Run the CLI with a PDF or content file, e.g., /paper-slide-deck path/to/paper.pdf.
Step 2: Choose --style, --audience, --lang, and optional flags like --slides or --outline-only.
Step 3: Export the final deck to PPTX or PDF (use merge-to-pptx.ts or merge-to-pdf.ts).

Best Practices

Choose a matching style from the Style Gallery (e.g., academic-paper for conferences, sketch-notes for tutorials).
Use --outline-only first to validate structure before image generation.
Review auto-detected figures and captions; add alt text for accessibility.
Specify audience (--audience) to tailor tone and depth.
Limit slides with --slides to control pacing and avoid overload.

Example Use Cases

Convert a NeurIPS paper into a 12-slide AI conference talk with auto-detected figures.
Turn a journal article into a 15-slide corporate deck using the corporate style.
Generate an 8-slide executive briefing from content.md in English or Chinese.
Create chalkboard-style slides from classroom notes for a tutorial.
Produce slides in zh for a bilingual research presentation.

Frequently Asked Questions

Add this skill to your agents