mcp-pdf-extraction
MCP server to extract contents from a PDF file
claude mcp add --transport stdio xraywu-mcp-pdf-extraction-server python -m pdf_extraction
How to use
This MCP server provides a single tool called extract-pdf-contents, implemented by the pdf_extraction package. The server enables Claude Code users to extract text and content from local PDF files, with optional OCR for scanned documents. The tool accepts a required pdf_path argument pointing to the local PDF file, and an optional pages argument allowing comma-separated page numbers (including negative indexing like -1 for the last page). When using OCR, the server can extract text from images within the PDF, not just the embedded text. Once connected to Claude, you can issue requests to extract specific pages or full-document content, and Claude will relay the results back to you through the MCP interface. This fork ensures compatibility with Claude Code by providing a module-entry point so you can run the server with python -m pdf_extraction.
How to install
Prerequisites:
- Python 3.11 or higher
- pip (or conda)
- Claude Code CLI installed (the claude command)
Step 1: Clone the repository and install in editable mode
# Clone this fork
git clone https://github.com/lh/mcp-pdf-extraction-server.git
cd mcp-pdf-extraction-server
# Install in development mode
pip install -e .
Step 2: Verify the installed command (optional)
# Check where the pdf_extraction module is exposed
python -m pip show pdf_extraction
# Or locate the module path if installed
which pdf-extraction
Step 3: Run the server locally (optional for testing)
python -m pdf_extraction
Step 4: If using Claude Code CLI, add the MCP server
claude mcp add pdf-extraction /path/to/python -m pdf_extraction
claude mcp list
Notes:
- Ensure your Python environment matches the one where the package was installed.
- If you encounter import errors, verify your virtual environment activation and Python version (3.11+).
- The repository includes main.py to allow running the package as a module with python -m pdf_extraction.
Additional notes
Tips and common issues:
- Claude Code CLI integration: Use the full path to the Python executable when adding the MCP server to Claude Code, e.g., claude mcp add pdf-extraction /path/to/python -m pdf_extraction.
- OCR requires dependencies like pytesseract and Pillow; ensure those are installed if you need OCR capabilities.
- If the server fails to connect in Claude, restart your Claude session and re-add the MCP server to ensure the correct command path is used.
- For troubleshooting import errors, confirm you are operating in the same Python environment where the package was installed and consider using python -m pdf_extraction to test running the module directly.
- This fork adds main.py to support python -m pdf_extraction usage, but the core functionality remains the pdf_extraction tool exposing extract-pdf-contents.
- Dependencies include mcp>=1.2.0, pypdf2, pytesseract, Pillow, pydantic, and pymupdf; install these as needed for full feature support (OCR and PDF parsing).
Related MCP Servers
mcp-vegalite
MCP server from isaacwasserman/mcp-vegalite-server
github-chat
A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
pagerduty
PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.
futu-stock
mcp server for futuniuniu stock
mcp -boilerplate
Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP