Get the FREE Ultimate OpenClaw Setup Guide →

mcp-pdf-extraction

MCP server to extract contents from a PDF file

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio xraywu-mcp-pdf-extraction-server python -m pdf_extraction

How to use

This MCP server provides a single tool called extract-pdf-contents, implemented by the pdf_extraction package. The server enables Claude Code users to extract text and content from local PDF files, with optional OCR for scanned documents. The tool accepts a required pdf_path argument pointing to the local PDF file, and an optional pages argument allowing comma-separated page numbers (including negative indexing like -1 for the last page). When using OCR, the server can extract text from images within the PDF, not just the embedded text. Once connected to Claude, you can issue requests to extract specific pages or full-document content, and Claude will relay the results back to you through the MCP interface. This fork ensures compatibility with Claude Code by providing a module-entry point so you can run the server with python -m pdf_extraction.

How to install

Prerequisites:

  • Python 3.11 or higher
  • pip (or conda)
  • Claude Code CLI installed (the claude command)

Step 1: Clone the repository and install in editable mode

# Clone this fork
git clone https://github.com/lh/mcp-pdf-extraction-server.git
cd mcp-pdf-extraction-server

# Install in development mode
pip install -e .

Step 2: Verify the installed command (optional)

# Check where the pdf_extraction module is exposed
python -m pip show pdf_extraction
# Or locate the module path if installed
which pdf-extraction

Step 3: Run the server locally (optional for testing)

python -m pdf_extraction

Step 4: If using Claude Code CLI, add the MCP server

claude mcp add pdf-extraction /path/to/python -m pdf_extraction
claude mcp list

Notes:

  • Ensure your Python environment matches the one where the package was installed.
  • If you encounter import errors, verify your virtual environment activation and Python version (3.11+).
  • The repository includes main.py to allow running the package as a module with python -m pdf_extraction.

Additional notes

Tips and common issues:

  • Claude Code CLI integration: Use the full path to the Python executable when adding the MCP server to Claude Code, e.g., claude mcp add pdf-extraction /path/to/python -m pdf_extraction.
  • OCR requires dependencies like pytesseract and Pillow; ensure those are installed if you need OCR capabilities.
  • If the server fails to connect in Claude, restart your Claude session and re-add the MCP server to ensure the correct command path is used.
  • For troubleshooting import errors, confirm you are operating in the same Python environment where the package was installed and consider using python -m pdf_extraction to test running the module directly.
  • This fork adds main.py to support python -m pdf_extraction usage, but the core functionality remains the pdf_extraction tool exposing extract-pdf-contents.
  • Dependencies include mcp>=1.2.0, pypdf2, pytesseract, Pillow, pydantic, and pymupdf; install these as needed for full feature support (OCR and PDF parsing).

Related MCP Servers

Sponsor this space

Reach thousands of developers