Get the FREE Ultimate OpenClaw Setup Guide →

pyspark

MPC Server for PySpark inpired by the LakeSail

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio semyonsinchenko-pyspark-mcp-server python -m pyspark_mcp

How to use

The PySpark MCP Server exposes a Model Context Protocol (MCP) interface for interacting with Apache Spark. It lets AI systems query Spark for logical and physical query plans, table and catalog information, and other query-plan details to assist with tasks like query optimization and data discovery. The server bundles a suite of MCP tools that you can invoke via the service to extract specific insights about the current Spark session, such as the analyzed or optimized plan for a SQL query, table schemas, catalog listings, and more. Run the server with the pyspark-mcp CLI and then connect to it using MCP clients or Claude-code integrations as described in the Quick Start.

How to install

Prerequisites:

  • Python 3.11 or newer (3.x) installed on your system
  • Internet access to install packages

Install the PySpark MCP Server package:

pip install pyspark-mcp

Run the MCP server:

pyspark-mcp --master "local[*]" --host 127.0.0.1 --port 8090

Notes:

  • The CLI automatically configures spark-submit. You can pass standard spark-submit options such as --conf, --jars, --packages, --executor-memory, etc., and they will be forwarded to Spark.
  • You can integrate the server with Claude-code or other MCP clients by registering the HTTP transport endpoint (e.g., http://127.0.0.1:8090/mcp).

Optional: Run a dry-run to preview the spark-submit configuration without executing:

pyspark-mcp --master "local[*]" --dry-run

Additional notes

Tips and common considerations:

  • Ensure your Spark installation is compatible with your PySpark version and the connected Spark cluster if not running in local mode.
  • The server exposes a variety of MCP tools (e.g., get version, get analyzed/optimized plan, estimate size, read table schemas, list catalogs/databases/tables). Use these to feed AI systems for plan analysis and data discovery.
  • When deploying in Claude-code or other environments, remember to open the port (default 8090) and use the correct host address in the MCP transport URL.
  • If you encounter connection issues, verify that the Spark session starts correctly with the provided master URL and that firewall rules allow HTTP access to the host/port.
  • You can pass additional Spark options via --conf, --jars, --packages, etc., and these will be applied to the underlying spark-submit invocation.

Related MCP Servers

Sponsor this space

Reach thousands of developers