Get the FREE Ultimate OpenClaw Setup Guide →

pdffigures

Extract figures and tables from PDF documents using this FastAPI-based service. The Figure Extractor API and MCP Server provides a straightforward HTTP interface for PDFFigures 2.0, a robust figure extraction system developed by the Allen Institute for AI.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio vlln-pdffigures-mcp-server python figure_extractor.py \
  --env PORT="Port for MCP server (default 5001)" \
  --env FASTMCP_URL="URL of the MCP gateway or controller (if applicable)"

How to use

This MCP server wraps the PDF figure extractor service and exposes it as an MCP tool named extract_figures_from_pdf. It runs a FastAPI-based API that can accept a PDF via a URL or file upload and returns a structured JSON with extracted figures and tables, including render URLs. The server is accessible at the default MCP URL http://localhost:5001/mcp. You can call the extract_figures_from_pdf tool over HTTP by POSTing to this endpoint with either a pdf_url form field or a file field. The FastMCP integration enables AI Agents or workflows to programmatically invoke the extraction capability as part of larger data processing or RAG pipelines.

How to install

Prerequisites:

  • Python 3.8+ installed on your system
  • Access to the repository containing the figure_extractor.py script
  • Optional Docker if you prefer containerized deployment

Install and run (manual Python execution):

  1. Clone the repository and navigate to the project directory: git clone https://github.com/Huang-lab/figure-extractor.git cd pdf-extraction

  2. Install dependencies: python -m pip install --upgrade pip pip install -r requirements.txt

  3. Run the MCP server script (which serves the extract_figures_from_pdf tool): python figure_extractor.py

  4. By default, the MCP endpoint will be available at http://localhost:5001/mcp if you follow the repo's setup. If you need to adjust the port, set the PORT environment variable or modify the script invocation accordingly.

Docker deployment (optional):

  1. Build the Docker image: docker build -t pdf-extraction .

  2. Run the container exposing port 5001: docker run -p 5001:5001 pdf-extraction

  3. Open http://localhost:5001/docs to view API docs and test the endpoint.

Additional notes

Tips and common considerations:

  • The MCP tool extract_figures_from_pdf accepts either a PDF URL (pdf_url) or a local file upload (file) and returns a JSON payload with extracted figures and tables, including renderURL fields for image previews.
  • Default MCP URL is http://localhost:5001/mcp. If you change the port, update your MCP orchestration accordingly.
  • When running via Docker, ensure port mappings (-p 5001:5001) align with the internal server port.
  • If you encounter CORS or authentication issues, ensure proper network access between the MCP controller and the server and consider using a reverse proxy if necessary.
  • Environment variables can be used to customize endpoints or integration with your MCP gateway; consult the mcp_config for required fields in your environment.

Related MCP Servers

Sponsor this space

Reach thousands of developers