paper-analyzer
Scannednpx machina-cli add skill proyecto26/sherlock-ai-plugin/paper-analyzer --openclawAcademic Paper Analyzer – In-Depth Analysis of Academic Papers
Core Capabilities
- MinerU Cloud API for high-precision PDF parsing
- Automatic extraction of images, tables, and LaTeX formulas
- Multiple writing styles: storytelling / academic / concise
- Optional formula explanations: insert formula images with detailed symbol explanations
- Optional code analysis: combine explanations with GitHub open-source code
- Output Markdown + HTML (base64-embedded images)
Prerequisites
MinerU API Token
- Visit https://mineru.net and register an account
- Obtain an API Token
- Set an environment variable (recommended):
export MINERU_TOKEN="your_token_here"
Dependency Installation
pip install requests markdown
Workflow
Step 1: PDF Parsing (Using MinerU API)
python scripts/mineru_api.py <pdf_path> <output_dir>
Or pass the token directly:
python scripts/mineru_api.py paper.pdf ./output YOUR_TOKEN
Output:
output_dir/*.md– Markdown files (including formulas and tables)output_dir/images/– High-quality extracted images
Step 2: Extract Paper Metadata
python scripts/extract_paper_info.py <output_dir>/*.md paper_info.json
Step 3: Style Selection (Ask the User)
Before generating the article, you must ask the user to choose the following options:
1. Writing Style (Required)
| Style | Characteristics | Use Cases |
|---|---|---|
| storytelling | Starts from intuition, uses metaphors and examples, narrative-driven | Blogs, tech columns, popular science |
| academic | Professional terminology, rigorous expression, preserves original concepts | Academic reports, surveys, research group sharing |
| concise | Straight to the point, tables and lists, high information density | Quick reads, paper overviews, technical research |
2. Formula Option (Optional)
| Option | Description |
|---|---|
| with-formulas | Insert formula images and explain symbol meanings in detail |
| no-formulas (default) | Pure text description, no formula images |
3. Code Option (Optional, only if the paper has GitHub)
| Option | Description |
|---|---|
| with-code | Clone the repository, include key source code, and explain it alongside the paper |
| no-code (default) | No code analysis |
Step 4: Intelligent Article Generation
(...)
API Limits
- Maximum file size: 200MB
- Maximum pages per file: 600
- Supports PDF, DOC, PPT, images, and more
Source
git clone https://github.com/proyecto26/sherlock-ai-plugin/blob/main/skills/paper-analyzer/SKILL.mdView on GitHub Overview
paper-analyzer converts academic papers into in-depth technical articles using the MinerU Cloud API for high-precision PDF parsing. It automatically extracts images, tables, and LaTeX formulas, supports optional formula explanations and GitHub code analysis, and outputs Markdown and HTML formats.
How This Skill Works
The tool parses PDFs via the MinerU Cloud API to extract content such as images, tables, and LaTeX formulas. After parsing, you choose a writing style (storytelling, academic, or concise) and optional features (with-formulas and/or with-code); it then generates article-ready Markdown and HTML with embedded visuals.
When to Use It
- You need a storytelling piece for a tech blog that explains a paper intuitively.
- You must produce an academic report preserving professional terminology and concepts.
- You want a concise, high-density overview for quick internal review.
- You have a GitHub repo linked to the paper and want integrated code analysis.
- You require a math-heavy article with formula images and detailed symbol explanations.
Quick Start
- Step 1: Set up your MinerU API token and install dependencies (pip install requests markdown).
- Step 2: Run the MinerU parsing script on your PDF to generate Markdown and extract images.
- Step 3: Choose a writing style and optional features, then generate the Markdown/HTML article.
Best Practices
- Define the target writing style before generation to guide tone and structure.
- Provide the paper in a clean PDF to improve parsing accuracy (images, tables, formulas).
- Use with-formulas when equations are central to the paper's contributions.
- If code analysis is needed, attach the related GitHub repo and choose with-code.
- Review the Markdown/HTML output to verify embedded images render correctly.
Example Use Cases
- Story-driven blog post explaining a machine learning paper with formula explanations.
- Academic survey preserving methods and terminology for a conference proceedings.
- Concise quick-read overview for a research group briefing.
- HTML article with embedded images for a conference handout or slides.
- Code-enhanced article linking to a GitHub repo and analyzing its implementation.