exstruct
Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading and writing by AI agents through MCP integration.
claude mcp add --transport stdio harumiweb-exstruct uvx --from 'exstruct[mcp]' exstruct-mcp --root C:\data --log-file C:\logs\exstruct-mcp.log --on-conflict rename
How to use
ExStruct MCP server exposes a set of extraction and utility tools for processing Excel workbooks into structured JSON (and optional YAML/TOON). The MCP server runs via uvx, which manages dependencies and isolation, so you can launch exstruct-mcp without a local Python environment. Available tools include exstruct_extract (the core extractor), exstruct_capture_sheet_images (COM-only image capture of sheet visuals), exstruct_make, exstruct_patch, exstruct_read_json_chunk, exstruct_read_range, exstruct_read_cells, exstruct_read_formulas, and exstruct_validate_input. These tools let you pull cells, formulas, shapes, charts, and structural metadata from Excel workbooks and emit outputs suitable for downstream RAG/LLM pipelines. When COM is unavailable or not desired, the server can fall back to openpyxl-based extraction for cells and table candidates, with reduced feature sets.
To use the MCP server, run the uvx-based command to start the exstruct-mcp server, then send requests for the specific tools you need. For example, exstruct_extract will read an input workbook and return a structured JSON representation of the data, including cells, table candidates, and print areas depending on the mode. You can tune behavior with options like mode (light, standard, verbose), alpha_col for column key formatting, and enable formulas_map or other outputs. If you need image captures or page-break data, enable the corresponding tools and verify your environment supports COM (on Windows) or the necessary dependencies for non-COM scenarios.
In typical workflows, you would run the MCP server alongside your application and query it via standard IO streams or a client that invokes the CLI tools exposed by the MCP stack. The tools are designed to integrate into data pipelines where Excel content is preprocessed into JSON-like structures that a language model or downstream processors can consume.
How to install
Prerequisites:
- Python (recommended) or a compatible runtime for uvx-based MCP usage
- Optional: Windows Excel for full COM-based extraction
- Internet access for dependency installation
Option A: Using uvx (recommended, no local Python install required)
- Install uvx if you don't have it already (follow uvx installation instructions from the project).
- Start the MCP server using the provided command (adjust paths as needed):
uvx --from 'exstruct[mcp]' exstruct-mcp --root C:\data --log-file C:\logs\exstruct-mcp.log --on-conflict rename
Option B: Traditional Python installation (requires pip)
- Install the MCP package with extras:
pip install exstruct[mcp]
- Run the MCP server (example assumes a CLI entrypoint named exstruct-mcp):
exstruct-mcp --root C:\data --log-file C:\logs\exstruct-mcp.log --on-conflict rename
Notes:
- If you need YAML/TOON outputs or rendering, install extras accordingly (e.g., exstruct[mcp,yaml,toon,render] if supported by the build).
- On non-Windows platforms, some features (like COM-based extraction) may be unavailable; use mode=light for openpyxl-based extraction.
- Ensure the MCP server’s environment variables align with your security and logging requirements (see additional_notes for details).
Additional notes
Tips and considerations:
- COM availability drives feature coverage. On Windows with Excel, you can enable richer extraction (charts, shapes, smartart). On other platforms, the server falls back to cells/table candidates.
- Use mode=light for a lightweight extraction when COM or advanced features are not feasible.
- If you rely on alpha_col formatting (A, B, ... keys), pass --alpha-col to the CLI or configure the server accordingly; by default, exstruct_extract emits A/B-style keys in modern configurations.
- For image capture, exstruct_capture_sheet_images is COM-only and accepts targeted sheets/ranges; if out_dir is omitted, it will create a workbook_images directory under the MCP root.
- Timeouts and subprocess behavior can be tuned via EXSTRUCT_RENDER_SUBPROCESS and related environment variables if you opt into in-process vs subprocess rendering.
- Logging typically goes to stderr; you can redirect to a file with --log-file to preserve stdout for structured responses.
- If you encounter issues with preprocessing or formula maps, check the availability of pyyaml (for YAML output) and python-toon (for TOON outputs), and ensure versions are compatible with your Python environment.
Related MCP Servers
context7
Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
excel
A Model Context Protocol server for Excel file manipulation
spring-ai
From Java Dev to AI Engineer: Spring AI Fast Track
nutrient-dws
A Model Context Protocol (MCP) server implementation that integrates with the Nutrient Document Web Service (DWS) Processor API, providing powerful PDF processing capabilities for AI assistants.
time
⏰ Time MCP Server: Giving LLMs Time Awareness Capabilities
advanced-homeassistant
An advanced MCP server for Home Assistant. 🔋 Batteries included.