beam
MCP server to manage apache beam workflows with different runners
claude mcp add --transport stdio souravch-beam-mcp-server python main.py --debug --port 8888 \ --env GCP_REGION="us-central1" \ --env GCP_PROJECT_ID="your-gcp-project-id" \ --env PYTHONUNBUFFERED="1"
How to use
This MCP server provides a unified API for managing Apache Beam pipelines across multiple runners (Flink, Spark, Dataflow, and Direct) using the MCP standard. It exposes endpoints to list, create, monitor, and manage jobs and resources, as well as tools and contexts to enable AI-assisted pipeline orchestration. Core endpoints include /tools for registering and configuring processing tools, /resources for datasets and other inputs, and /contexts for defining execution environments and runner configurations. You can interact with the server to submit WordCount-style pipelines, switch between runners, and observe pipeline status and metrics through the MCP-compliant API.
To start using it, run the server (for example: python main.py --debug --port 8888) and then issue REST requests to the documented endpoints. The server provides a Python client example to help you integrate programmatic control into your applications. Once running, you can add tools (like a sentiment analyzer), register datasets, create execution contexts for Dataflow or Flink, and submit jobs via the /jobs endpoint. The API is designed to be AI-friendly, so you can orchestrate pipelines and leverage LLM-driven decision making to select runners, configure options, and monitor progress.
How to install
Prerequisites:
- Python 3.9 or higher
- pip (packaged with Python)
- Internet access to install dependencies
Step-by-step installation:
-
Clone the repository: git clone https://github.com/yourusername/beam-mcp-server.git cd beam-mcp-server
-
Create and/or activate a virtual environment (recommended): python -m venv beam-mcp-venv
macOS/Linux
source beam-mcp-venv/bin/activate
Windows
beam-mcp-venv\Scripts\activate
-
Install dependencies: pip install -r requirements.txt
-
(Optional) If you plan to run with Docker, build images as per the Docker instructions in the README.
-
Run the server: python main.py --debug --port 8888
-
Verify startup by hitting the API root or /health endpoint, e.g.: curl http://localhost:8888/health
Additional notes
Notes and tips:
- The server supports multiple runners via endpoints; use /contexts to configure runner-specific parameters and /jobs to submit pipelines.
- For Docker deployments, ensure environment variables like GCP_PROJECT_ID and GCP_REGION are set if your pipelines interact with Google Cloud resources.
- If you encounter port binding issues, check for existing processes using port 8888 and update the port accordingly in the command.
- Enable logging and monitoring by using the /metrics endpoint for Prometheus and reading the JSON-formatted logs emitted by the server.
- When using the Docker image, mount a config directory (config/) to /app/config and set environment variables as needed to point to your resources and credentials.
- If you expand to Kubernetes, refer to the included Kubernetes Deployment Guide for manifests and Helm charts.
- Ensure your Python environment includes compatible versions of dependencies listed in requirements.txt to avoid import errors.
Related MCP Servers
mcp-vegalite
MCP server from isaacwasserman/mcp-vegalite-server
github-chat
A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
pagerduty
PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.
futu-stock
mcp server for futuniuniu stock
mcp -boilerplate
Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP