pyspark
MPC Server for PySpark inpired by the LakeSail
claude mcp add --transport stdio semyonsinchenko-pyspark-mcp-server python -m pyspark_mcp
How to use
The PySpark MCP Server exposes a Model Context Protocol (MCP) interface for interacting with Apache Spark. It lets AI systems query Spark for logical and physical query plans, table and catalog information, and other query-plan details to assist with tasks like query optimization and data discovery. The server bundles a suite of MCP tools that you can invoke via the service to extract specific insights about the current Spark session, such as the analyzed or optimized plan for a SQL query, table schemas, catalog listings, and more. Run the server with the pyspark-mcp CLI and then connect to it using MCP clients or Claude-code integrations as described in the Quick Start.
How to install
Prerequisites:
- Python 3.11 or newer (3.x) installed on your system
- Internet access to install packages
Install the PySpark MCP Server package:
pip install pyspark-mcp
Run the MCP server:
pyspark-mcp --master "local[*]" --host 127.0.0.1 --port 8090
Notes:
- The CLI automatically configures spark-submit. You can pass standard spark-submit options such as --conf, --jars, --packages, --executor-memory, etc., and they will be forwarded to Spark.
- You can integrate the server with Claude-code or other MCP clients by registering the HTTP transport endpoint (e.g., http://127.0.0.1:8090/mcp).
Optional: Run a dry-run to preview the spark-submit configuration without executing:
pyspark-mcp --master "local[*]" --dry-run
Additional notes
Tips and common considerations:
- Ensure your Spark installation is compatible with your PySpark version and the connected Spark cluster if not running in local mode.
- The server exposes a variety of MCP tools (e.g., get version, get analyzed/optimized plan, estimate size, read table schemas, list catalogs/databases/tables). Use these to feed AI systems for plan analysis and data discovery.
- When deploying in Claude-code or other environments, remember to open the port (default 8090) and use the correct host address in the MCP transport URL.
- If you encounter connection issues, verify that the Spark session starts correctly with the provided master URL and that firewall rules allow HTTP access to the host/port.
- You can pass additional Spark options via --conf, --jars, --packages, etc., and these will be applied to the underlying spark-submit invocation.
Related MCP Servers
mcp-vegalite
MCP server from isaacwasserman/mcp-vegalite-server
github-chat
A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
pagerduty
PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.
futu-stock
mcp server for futuniuniu stock
mcp -boilerplate
Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP