mcp-apache-spark-history

MCP Server for Apache Spark History Server. The bridge between Agentic AI and Apache Spark.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio kubeflow-mcp-apache-spark-history-server python -m spark_history_mcp.core.main \
  --env SHS_MCP_CONFIG="Path to your Spark History Server MCP config (config.yaml). Optional if default config.yaml is used."

How to use

This MCP server provides a Python-based MCP bridge that connects to your Apache Spark History Server and exposes a suite of analysis tools for AI agents. The server enables querying Spark data, analyzing job performance, and comparing multiple jobs across applications. Tools include retrieving application metadata, listing jobs and stages, fetching executor and resource usage details, and examining environment and configuration data. With these capabilities, an AI agent can answer natural-language questions like which jobs were the slowest, which stages consumed the most I/O, or how memory usage trended across a set of runs. The server is designed to be invoked via MCP clients using standardized tool calls and can be integrated into LangChain, Inspector, or other MCP-compatible clients.

How to install

Prerequisites:

Python 3.12+ installed on your system
Access to a Spark History Server (SHS) instance
Optional: uv package manager if you prefer the uv runner (see README for details)

Install and run with Python:

Install the MCP package from PyPI python -m venv spark-mcp source spark-mcp/bin/activate # activate virtual environment (Linux/macOS)

On Windows: spark-mcp\Scripts\activate

pip install mcp-apache-spark-history-server
Run the MCP server python -m spark_history_mcp.core.main
(Optional) If you want to run locally with uv instead of the Python module, install uv and run the package via uv as described in the project docs: uvx --from mcp-apache-spark-history-server spark-mcp
If you prefer to clone and run directly from source: git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git cd mcp-apache-spark-history-server python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -e . python -m spark_history_mcp.core.main

Prerequisites recap:

Python 3.12 or newer
Access to a Spark History Server
Optional: uv if using the uv runner, or Node.js if using Inspector tooling

Additional notes

Configuration:

The MCP server reads config.yaml for Spark History Server connections and MCP settings. You can point to a local or remote SHS instance and adjust authentication as needed.
Default config path is ./config.yaml unless overridden by SHS_MCP_CONFIG environment variable.
The server exposes 18 specialized tools categorized under Application Information, Job Analysis, Stage Analysis, Executor & Resource Analysis, Configuration & Environment, and SQL & Query Analysis. Clients can compose queries across these tools to build powerful, multi-step analyses.
If you encounter issues starting the MCP server, ensure Python 3.12+ is in use, the SHS is reachable at the configured URL, and any required dependencies are installed in your virtual environment.
Inspector (optional) requires Node.js 22.7.5+; start it with the provided start-inspector script if you want an interactive testing UI.

Related MCP Servers

web-eval-agent

1.2k

An MCP server that autonomously evaluates web applications.

mcp-neo4j

911

Neo4j Labs Model Context Protocol servers

Gitingest

136

mcp server for gitingest

fhir

FHIR MCP Server – helping you expose any FHIR Server or API as a MCP Server.

unitree-go2

The Unitree Go2 MCP Server is a server built on the MCP that enables users to control the Unitree Go2 robot using natural language commands interpreted by a LLM.

sympy

A MCP server for symbolic manipulation of mathematical expressions