pyspark

MPC Server for PySpark inpired by the LakeSail

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio semyonsinchenko-pyspark-mcp-server python -m pyspark_mcp

How to use

The PySpark MCP Server exposes a Model Context Protocol (MCP) interface for interacting with Apache Spark. It lets AI systems query Spark for logical and physical query plans, table and catalog information, and other query-plan details to assist with tasks like query optimization and data discovery. The server bundles a suite of MCP tools that you can invoke via the service to extract specific insights about the current Spark session, such as the analyzed or optimized plan for a SQL query, table schemas, catalog listings, and more. Run the server with the pyspark-mcp CLI and then connect to it using MCP clients or Claude-code integrations as described in the Quick Start.

How to install

Prerequisites:

Python 3.11 or newer (3.x) installed on your system
Internet access to install packages

Install the PySpark MCP Server package:

pip install pyspark-mcp

Run the MCP server:

pyspark-mcp --master "local[*]" --host 127.0.0.1 --port 8090

Notes:

The CLI automatically configures spark-submit. You can pass standard spark-submit options such as --conf, --jars, --packages, --executor-memory, etc., and they will be forwarded to Spark.
You can integrate the server with Claude-code or other MCP clients by registering the HTTP transport endpoint (e.g., http://127.0.0.1:8090/mcp).

Optional: Run a dry-run to preview the spark-submit configuration without executing:

pyspark-mcp --master "local[*]" --dry-run

Additional notes

Tips and common considerations:

Ensure your Spark installation is compatible with your PySpark version and the connected Spark cluster if not running in local mode.
The server exposes a variety of MCP tools (e.g., get version, get analyzed/optimized plan, estimate size, read table schemas, list catalogs/databases/tables). Use these to feed AI systems for plan analysis and data discovery.
When deploying in Claude-code or other environments, remember to open the port (default 8090) and use the correct host address in the MCP transport URL.
If you encounter connection issues, verify that the Spark session starts correctly with the provided master URL and that firewall rules allow HTTP access to the host/port.
You can pass additional Spark options via --conf, --jars, --packages, etc., and these will be applied to the underlying spark-submit invocation.

Related MCP Servers

mcp-vegalite

MCP server from isaacwasserman/mcp-vegalite-server

github-chat

A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.

nautex

MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline

pagerduty

PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.

futu-stock

mcp server for futuniuniu stock

mcp -boilerplate

Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP