mcp-pdf-extraction

MCP server to extract contents from a PDF file

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio xraywu-mcp-pdf-extraction-server python -m pdf_extraction

How to use

This MCP server provides a single tool called extract-pdf-contents, implemented by the pdf_extraction package. The server enables Claude Code users to extract text and content from local PDF files, with optional OCR for scanned documents. The tool accepts a required pdf_path argument pointing to the local PDF file, and an optional pages argument allowing comma-separated page numbers (including negative indexing like -1 for the last page). When using OCR, the server can extract text from images within the PDF, not just the embedded text. Once connected to Claude, you can issue requests to extract specific pages or full-document content, and Claude will relay the results back to you through the MCP interface. This fork ensures compatibility with Claude Code by providing a module-entry point so you can run the server with python -m pdf_extraction.

How to install

Prerequisites:

Python 3.11 or higher
pip (or conda)
Claude Code CLI installed (the claude command)

Step 1: Clone the repository and install in editable mode

# Clone this fork
git clone https://github.com/lh/mcp-pdf-extraction-server.git
cd mcp-pdf-extraction-server

# Install in development mode
pip install -e .

Step 2: Verify the installed command (optional)

# Check where the pdf_extraction module is exposed
python -m pip show pdf_extraction
# Or locate the module path if installed
which pdf-extraction

Step 3: Run the server locally (optional for testing)

python -m pdf_extraction

Step 4: If using Claude Code CLI, add the MCP server

claude mcp add pdf-extraction /path/to/python -m pdf_extraction
claude mcp list

Notes:

Ensure your Python environment matches the one where the package was installed.
If you encounter import errors, verify your virtual environment activation and Python version (3.11+).
The repository includes main.py to allow running the package as a module with python -m pdf_extraction.

Additional notes

Tips and common issues:

Claude Code CLI integration: Use the full path to the Python executable when adding the MCP server to Claude Code, e.g., claude mcp add pdf-extraction /path/to/python -m pdf_extraction.
OCR requires dependencies like pytesseract and Pillow; ensure those are installed if you need OCR capabilities.
If the server fails to connect in Claude, restart your Claude session and re-add the MCP server to ensure the correct command path is used.
For troubleshooting import errors, confirm you are operating in the same Python environment where the package was installed and consider using python -m pdf_extraction to test running the module directly.
This fork adds main.py to support python -m pdf_extraction usage, but the core functionality remains the pdf_extraction tool exposing extract-pdf-contents.
Dependencies include mcp>=1.2.0, pypdf2, pytesseract, Pillow, pydantic, and pymupdf; install these as needed for full feature support (OCR and PDF parsing).

Related MCP Servers

mcp-vegalite

MCP server from isaacwasserman/mcp-vegalite-server

github-chat

A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.

nautex

MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline

pagerduty

PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.

futu-stock

mcp server for futuniuniu stock

mcp -boilerplate

Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP