PDFlow
Transform PDFs into structured data with AI-powered extraction + CLI + MCP + Web.
claude mcp add --transport stdio traves-theberge-pdflow node path/to/server.js \ --env GEMINI_API_KEY="Your Google Gemini API key (required for AI extraction)"
How to use
PDFlow is a modern, AI-powered PDF extraction tool that can be used via a web UI, a command-line interface (CLI), or as an MCP server for integration with other agents. As an MCP server, it can accept prompts and return structured outputs (JSON, Markdown, MDX, etc.) derived from PDF content. The tool supports converting PDFs to multiple formats, streaming results in real-time, and integrating with AI agents via a REST API. Install and run PDFlow in your environment, then configure the MCP integration to generate and consume structured data from PDFs using the provided CLI commands and the MCP prompts that your agents will send.
With PDFlow’s MCP capabilities, you can generate prompts for AI agents, request page-by-page extractions, or ask for aggregated outputs across documents. The MCP workflow supports generating configuration for different clients (e.g., Claude, Cursor, Claude Code) and can be wired into other automation tasks that rely on reliably structured data from PDFs. The included REST API offers programmatic access to uploading, processing, and exporting results, enabling automated pipelines and custom tooling around PDFlow’s extraction capabilities.
How to install
Prerequisites:
- Node.js 20+ (required for Next.js 16 based deployments)
- npm or yarn
- pdftocairo (poppler-utils) installed on the host
- Google Gemini API key for AI extraction
Option A: Docker (Recommended for consistent environment)
- Install Docker and Docker Compose.
- Set your API key in the environment: export GEMINI_API_KEY="your-api-key-here"
- Build and start the containers: USER_ID=$(id -u) GROUP_ID=$(id -g) docker-compose build USER_ID=$(id -u) GROUP_ID=$(id -g) docker-compose up -d
- Open http://localhost:3535 to access the PDFlow UI.
Option B: Local Development
- Clone the repository and install dependencies: git clone https://github.com/traves-theberge/pdflow.git cd pdflow npm install
- Start the development server: npm run dev
- Open the web app at http://localhost:3001 and enter your Gemini API key when prompted.
Option C: Ensure prerequisites
- Ensure poppler-utils (pdftocairo) is installed: on Ubuntu/Debian: sudo apt-get install poppler-utils; on macOS: brew install poppler; on Windows, install a Windows poppler distribution and add to PATH.
- Ensure you have a Gemini API key and set it in the environment or via the app settings.
Additional notes
Tips and common issues:
- If you’re running in Docker, ensure UID/GID forwarding is correctly set to avoid file permission issues when mounting volumes.
- The Gemini API key is stored in browser session storage for the web UI and is never sent to PDFlow servers; use the environment variable GEMINI_API_KEY for CLI and MCP flows where applicable.
- PDFlow can export results in Markdown, MDX, JSON, XML, YAML, HTML, or CSV. Use the CLI or MCP config to specify the desired output format.
- When using the MCP feature, you can generate tool configurations for different clients (VS Code, Claude Desktop, Cursor, Claude Code) and switch between dev server or custom URLs as needed.
- If you encounter token or authentication issues with Gemini, validate your API key using the CLI command npm run pdflow -- validate-key and ensure the key has access to the Gemini API.
- For production deployments, consider using the Docker deployment guide and ensure appropriate resource limits are configured in your orchestrator.
Related MCP Servers
gemini-cli
An open-source AI agent that brings the power of Gemini directly into your terminal.
gemini -tool
MCP server that enables AI assistants to interact with Google Gemini CLI, leveraging Gemini's massive token window for large file analysis and codebase understanding
sudocode
Lightweight agent orchestration dev tool that lives in your repo
mcp-gemini
This project provides a dedicated MCP (Model Context Protocol) server that wraps the @google/genai SDK. It exposes Google's Gemini model capabilities as standard MCP tools, allowing other LLMs (like Cline) or MCP-compatible systems to leverage Gemini's features as a backend workhorse.
turn-based-game
A turn-based games app built with Next.js and TypeScript that features Tic-Tac-Toe and Rock Paper Scissors games with AI opponents powered by the Model Context Protocol (MCP), offering three difficulty levels.
architect
A powerful, self-extending MCP server for dynamic AI tool orchestration. Features sandboxed JS execution, capability-based security, automated rate limiting, marketplace integration, and a built-in monitoring dashboard. Built for the Model Context Protocol (MCP).