pdf

Production-ready MCP server for PDF processing with intelligent caching. Extract text, search, and analyze PDFs with AI agents.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio jztan-pdf-mcp pdf-mcp \
  --env PDF_MCP_CACHE_DIR="Path to the cache directory (default: ~/.cache/pdf-mcp)" \
  --env PDF_MCP_CACHE_TTL="Cache time-to-live in hours (default: 24)"

How to use

pdf-mcp is a production-ready MCP server for efficiently processing PDFs with intelligent caching. It exposes a set of specialized tools to read, search, extract, and cache content from PDF documents, enabling AI agents to interact with PDFs without repeatedly re-extracting content. The core features include eight dedicated tools: pdf_info to obtain document metadata and structure, pdf_read_pages to fetch specific page ranges in chunks, pdf_read_all for reading entire documents (subject to safety limits), pdf_search to locate relevant sections before loading content, pdf_get_toc to retrieve the table of contents, pdf_extract_images to pull images as base64 PNGs, pdf_cache_stats to inspect the cache, and pdf_cache_clear to invalidate or remove stale cache entries. The server uses a SQLite cache to persist results across restarts, which improves performance for repeated access. You can supply PDFs via local paths or HTTP(S) URLs and leverage the cache to speed up repeated queries across conversations.

How to install

Prerequisites:

Python 3.10+ installed on your system
Access to the internet to install dependencies

Step-by-step:

Create and (optional) virtual environment: python -m venv venv source venv/bin/activate # Unix/macOS venv\Scripts\activate.bat # Windows
Install the pdf-mcp package (development installs are fine for most users): pip install pdf-mcp
(Optional) Install from source for latest changes: git clone https://github.com/jztan/pdf-mcp.git cd pdf-mcp pip install -e ".[dev]" # includes dev dependencies
Run the MCP server: pdf-mcp

Prerequisites recap:

Ensure Python 3.10+ is available on your machine
Internet access for installation
Optional: a dedicated cache directory if you want to customize the cache location

Additional notes

Environment variables you can configure:

PDF_MCP_CACHE_DIR: path to the SQLite cache directory (default: ~/.cache/pdf-mcp)
PDF_MCP_CACHE_TTL: cache time-to-live in hours (default: 24)

Notes and tips:

The server caches metadata, page text, images, and the table of contents to speed up subsequent requests.
Cache invalidation occurs automatically when a PDF file’s modification time changes. You can also manually clear the cache with pdf_cache_clear.
The tool supports reading PDFs from URLs (HTTP/HTTPS) in addition to local files.
The included tools are designed to be used in sequence to minimize data loading, for example: pdf_info, followed by pdf_search, then pdf_read_pages for requested ranges.
If you run into issues with the cache size or TTL, adjust the environment variables accordingly and restart the server.