beam

MCP server to manage apache beam workflows with different runners

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio souravch-beam-mcp-server python main.py --debug --port 8888 \
  --env GCP_REGION="us-central1" \
  --env GCP_PROJECT_ID="your-gcp-project-id" \
  --env PYTHONUNBUFFERED="1"

How to use

This MCP server provides a unified API for managing Apache Beam pipelines across multiple runners (Flink, Spark, Dataflow, and Direct) using the MCP standard. It exposes endpoints to list, create, monitor, and manage jobs and resources, as well as tools and contexts to enable AI-assisted pipeline orchestration. Core endpoints include /tools for registering and configuring processing tools, /resources for datasets and other inputs, and /contexts for defining execution environments and runner configurations. You can interact with the server to submit WordCount-style pipelines, switch between runners, and observe pipeline status and metrics through the MCP-compliant API.

To start using it, run the server (for example: python main.py --debug --port 8888) and then issue REST requests to the documented endpoints. The server provides a Python client example to help you integrate programmatic control into your applications. Once running, you can add tools (like a sentiment analyzer), register datasets, create execution contexts for Dataflow or Flink, and submit jobs via the /jobs endpoint. The API is designed to be AI-friendly, so you can orchestrate pipelines and leverage LLM-driven decision making to select runners, configure options, and monitor progress.

How to install

Prerequisites:

Python 3.9 or higher
pip (packaged with Python)
Internet access to install dependencies

Step-by-step installation:

Clone the repository: git clone https://github.com/yourusername/beam-mcp-server.git cd beam-mcp-server
Create and/or activate a virtual environment (recommended): python -m venv beam-mcp-venv

macOS/Linux

source beam-mcp-venv/bin/activate

Windows

beam-mcp-venv\Scripts\activate
Install dependencies: pip install -r requirements.txt
(Optional) If you plan to run with Docker, build images as per the Docker instructions in the README.
Run the server: python main.py --debug --port 8888
Verify startup by hitting the API root or /health endpoint, e.g.: curl http://localhost:8888/health

Additional notes

Notes and tips:

The server supports multiple runners via endpoints; use /contexts to configure runner-specific parameters and /jobs to submit pipelines.
For Docker deployments, ensure environment variables like GCP_PROJECT_ID and GCP_REGION are set if your pipelines interact with Google Cloud resources.
If you encounter port binding issues, check for existing processes using port 8888 and update the port accordingly in the command.
Enable logging and monitoring by using the /metrics endpoint for Prometheus and reading the JSON-formatted logs emitted by the server.
When using the Docker image, mount a config directory (config/) to /app/config and set environment variables as needed to point to your resources and credentials.
If you expand to Kubernetes, refer to the included Kubernetes Deployment Guide for manifests and Helm charts.
Ensure your Python environment includes compatible versions of dependencies listed in requirements.txt to avoid import errors.

Related MCP Servers

mcp-vegalite

MCP server from isaacwasserman/mcp-vegalite-server

github-chat

A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.

nautex

MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline

pagerduty

PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.

futu-stock

mcp server for futuniuniu stock

mcp -boilerplate

Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP