data-forge

Data-Forge is a Model Context Protocol (MCP) server that transforms any LLM into a powerful Data Science Assistant. It provides a suite of high-performance tools for data loading, cleaning, validation, profiling, feature engineering, and visualization.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio angrysky56-data-forge-mcp uv --directory /absolute/path/to/data-forge-mcp run -m src.server \
  --env LOG_LEVEL="WARNING" \
  --env PYTHONUNBUFFERED="1" \
  --env NUMBA_DISABLE_CUDA="1" \
  --env NUMBA_ENABLE_CUDASIM="1"

How to use

Data-Forge is an MCP server that equips LLMs with a comprehensive data science toolkit. It provides core data management capabilities (loading datasets, inspecting schemas, and managing multiple loaded datasets), quality gates (validation with Pandera and cleaning with PyJanitor), profiling and insights (detailed data reports via YData Profiling), visualization (chart generation using Seaborn/Matplotlib), feature engineering (tsfresh feature extraction), acquisition (loading datasets from HuggingFace and scraping tables from web pages), visualization aids (geospatial maps with GeoPandas and an interactive D-Tale explorer), discovery probes (Topological Data Analysis for identifying semantic holes), and powerful headless SQL control through DuckDB. To use it, configure the MCP client to point to the Data-Forge MCP server, then invoke tools like load_data, get_dataset_profile, extract_tables, generate_chart, run_sql_query, and more. Each tool targets a specific data operation, enabling the LLM to orchestrate end-to-end data workflows across multiple datasets in a session.

How to install

Prerequisites:

Python 3.13+
uv (recommended) or pip

Installation steps:

Clone the repository:

git clone https://github.com/angrysky56/data-forge-mcp.git
cd data-forge-mcp

Install and run the MCP server using uv:

uv sync

(Optional) Configure environment for private Hugging Face datasets:

# create a .env file in the project root or set env vars directly
HF_TOKEN=your_hugging_face_token

Start or deploy as needed (see mcp_config) and ensure the path in the config matches your deployment.

Note: The README specifies using uv with the following invocation in the MCP config; adapt the absolute path to your environment accordingly.

Additional notes

Tips and considerations:

Ensure Python 3.13+ is installed and accessible in PATH.
If accessing private Hugging Face datasets, provide HF_TOKEN via a .env file or environment variables.
The mcp_config uses an absolute directory for the server; adjust /absolute/path/to/data-forge-mcp to your deployment path.
If you upgrade dependencies, verify compatibility with Pandas/Polars, Pandera, PyJanitor, YData Profiling, tsfresh, and DuckDB.
For performance, you may enable NUMBA_DISABLE_CUDA=1 and NUMBA_ENABLE_CUDASIM=1 to avoid GPU complexities in environments without GPUs.
When running multiple datasets, consider session management and naming conventions for dataset_id references (e.g., df_123).
If you encounter network or parsing issues in extract_tables, ensure website access is allowed and that the URL is correct and accessible.

Related MCP Servers

Wax

616

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File. Pure Swift

nautex

MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline

mcp-yfinance

Real-time stock API with Python, MCP server example, yfinance stock analysis dashboard

mcp_autogen_sse_stdio

This repository demonstrates how to use AutoGen to integrate local and remote MCP (Model Context Protocol) servers. It showcases a local math tool (math_server.py) using Stdio and a remote Apify tool (RAG Web Browser Actor) via SSE for tasks like arithmetic and web browsing.

cloudwatch-logs

MCP server from serkanh/cloudwatch-logs-mcp

servicenow-api

ServiceNow MCP Server and API Wrapper