pdffigures
Extract figures and tables from PDF documents using this FastAPI-based service. The Figure Extractor API and MCP Server provides a straightforward HTTP interface for PDFFigures 2.0, a robust figure extraction system developed by the Allen Institute for AI.
claude mcp add --transport stdio vlln-pdffigures-mcp-server python figure_extractor.py \ --env PORT="Port for MCP server (default 5001)" \ --env FASTMCP_URL="URL of the MCP gateway or controller (if applicable)"
How to use
This MCP server wraps the PDF figure extractor service and exposes it as an MCP tool named extract_figures_from_pdf. It runs a FastAPI-based API that can accept a PDF via a URL or file upload and returns a structured JSON with extracted figures and tables, including render URLs. The server is accessible at the default MCP URL http://localhost:5001/mcp. You can call the extract_figures_from_pdf tool over HTTP by POSTing to this endpoint with either a pdf_url form field or a file field. The FastMCP integration enables AI Agents or workflows to programmatically invoke the extraction capability as part of larger data processing or RAG pipelines.
How to install
Prerequisites:
- Python 3.8+ installed on your system
- Access to the repository containing the figure_extractor.py script
- Optional Docker if you prefer containerized deployment
Install and run (manual Python execution):
-
Clone the repository and navigate to the project directory: git clone https://github.com/Huang-lab/figure-extractor.git cd pdf-extraction
-
Install dependencies: python -m pip install --upgrade pip pip install -r requirements.txt
-
Run the MCP server script (which serves the extract_figures_from_pdf tool): python figure_extractor.py
-
By default, the MCP endpoint will be available at http://localhost:5001/mcp if you follow the repo's setup. If you need to adjust the port, set the PORT environment variable or modify the script invocation accordingly.
Docker deployment (optional):
-
Build the Docker image: docker build -t pdf-extraction .
-
Run the container exposing port 5001: docker run -p 5001:5001 pdf-extraction
-
Open http://localhost:5001/docs to view API docs and test the endpoint.
Additional notes
Tips and common considerations:
- The MCP tool extract_figures_from_pdf accepts either a PDF URL (pdf_url) or a local file upload (file) and returns a JSON payload with extracted figures and tables, including renderURL fields for image previews.
- Default MCP URL is http://localhost:5001/mcp. If you change the port, update your MCP orchestration accordingly.
- When running via Docker, ensure port mappings (-p 5001:5001) align with the internal server port.
- If you encounter CORS or authentication issues, ensure proper network access between the MCP controller and the server and consider using a reverse proxy if necessary.
- Environment variables can be used to customize endpoints or integration with your MCP gateway; consult the mcp_config for required fields in your environment.
Related MCP Servers
mcp-vegalite
MCP server from isaacwasserman/mcp-vegalite-server
github-chat
A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
pagerduty
PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.
futu-stock
mcp server for futuniuniu stock
mcp -boilerplate
Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP