mcp-apache-spark-history
MCP Server for Apache Spark History Server. The bridge between Agentic AI and Apache Spark.
claude mcp add --transport stdio kubeflow-mcp-apache-spark-history-server python -m spark_history_mcp.core.main \ --env SHS_MCP_CONFIG="Path to your Spark History Server MCP config (config.yaml). Optional if default config.yaml is used."
How to use
This MCP server provides a Python-based MCP bridge that connects to your Apache Spark History Server and exposes a suite of analysis tools for AI agents. The server enables querying Spark data, analyzing job performance, and comparing multiple jobs across applications. Tools include retrieving application metadata, listing jobs and stages, fetching executor and resource usage details, and examining environment and configuration data. With these capabilities, an AI agent can answer natural-language questions like which jobs were the slowest, which stages consumed the most I/O, or how memory usage trended across a set of runs. The server is designed to be invoked via MCP clients using standardized tool calls and can be integrated into LangChain, Inspector, or other MCP-compatible clients.
How to install
Prerequisites:
- Python 3.12+ installed on your system
- Access to a Spark History Server (SHS) instance
- Optional: uv package manager if you prefer the uv runner (see README for details)
Install and run with Python:
-
Install the MCP package from PyPI python -m venv spark-mcp source spark-mcp/bin/activate # activate virtual environment (Linux/macOS)
On Windows: spark-mcp\Scripts\activate
pip install mcp-apache-spark-history-server
-
Run the MCP server python -m spark_history_mcp.core.main
-
(Optional) If you want to run locally with uv instead of the Python module, install uv and run the package via uv as described in the project docs: uvx --from mcp-apache-spark-history-server spark-mcp
-
If you prefer to clone and run directly from source: git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git cd mcp-apache-spark-history-server python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -e . python -m spark_history_mcp.core.main
Prerequisites recap:
- Python 3.12 or newer
- Access to a Spark History Server
- Optional: uv if using the uv runner, or Node.js if using Inspector tooling
Additional notes
Configuration:
- The MCP server reads config.yaml for Spark History Server connections and MCP settings. You can point to a local or remote SHS instance and adjust authentication as needed.
- Default config path is ./config.yaml unless overridden by SHS_MCP_CONFIG environment variable.
- The server exposes 18 specialized tools categorized under Application Information, Job Analysis, Stage Analysis, Executor & Resource Analysis, Configuration & Environment, and SQL & Query Analysis. Clients can compose queries across these tools to build powerful, multi-step analyses.
- If you encounter issues starting the MCP server, ensure Python 3.12+ is in use, the SHS is reachable at the configured URL, and any required dependencies are installed in your virtual environment.
- Inspector (optional) requires Node.js 22.7.5+; start it with the provided start-inspector script if you want an interactive testing UI.
Related MCP Servers
web-eval-agent
An MCP server that autonomously evaluates web applications.
mcp-neo4j
Neo4j Labs Model Context Protocol servers
Gitingest
mcp server for gitingest
fhir
FHIR MCP Server – helping you expose any FHIR Server or API as a MCP Server.
unitree-go2
The Unitree Go2 MCP Server is a server built on the MCP that enables users to control the Unitree Go2 robot using natural language commands interpreted by a LLM.
sympy
A MCP server for symbolic manipulation of mathematical expressions