Get the FREE Ultimate OpenClaw Setup Guide →

data-forge

Data-Forge is a Model Context Protocol (MCP) server that transforms any LLM into a powerful Data Science Assistant. It provides a suite of high-performance tools for data loading, cleaning, validation, profiling, feature engineering, and visualization.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio angrysky56-data-forge-mcp uv --directory /absolute/path/to/data-forge-mcp run -m src.server \
  --env LOG_LEVEL="WARNING" \
  --env PYTHONUNBUFFERED="1" \
  --env NUMBA_DISABLE_CUDA="1" \
  --env NUMBA_ENABLE_CUDASIM="1"

How to use

Data-Forge is an MCP server that equips LLMs with a comprehensive data science toolkit. It provides core data management capabilities (loading datasets, inspecting schemas, and managing multiple loaded datasets), quality gates (validation with Pandera and cleaning with PyJanitor), profiling and insights (detailed data reports via YData Profiling), visualization (chart generation using Seaborn/Matplotlib), feature engineering (tsfresh feature extraction), acquisition (loading datasets from HuggingFace and scraping tables from web pages), visualization aids (geospatial maps with GeoPandas and an interactive D-Tale explorer), discovery probes (Topological Data Analysis for identifying semantic holes), and powerful headless SQL control through DuckDB. To use it, configure the MCP client to point to the Data-Forge MCP server, then invoke tools like load_data, get_dataset_profile, extract_tables, generate_chart, run_sql_query, and more. Each tool targets a specific data operation, enabling the LLM to orchestrate end-to-end data workflows across multiple datasets in a session.

How to install

Prerequisites:

  • Python 3.13+
  • uv (recommended) or pip

Installation steps:

  1. Clone the repository:
git clone https://github.com/angrysky56/data-forge-mcp.git
cd data-forge-mcp
  1. Install and run the MCP server using uv:
uv sync
  1. (Optional) Configure environment for private Hugging Face datasets:
# create a .env file in the project root or set env vars directly
HF_TOKEN=your_hugging_face_token
  1. Start or deploy as needed (see mcp_config) and ensure the path in the config matches your deployment.

Note: The README specifies using uv with the following invocation in the MCP config; adapt the absolute path to your environment accordingly.

Additional notes

Tips and considerations:

  • Ensure Python 3.13+ is installed and accessible in PATH.
  • If accessing private Hugging Face datasets, provide HF_TOKEN via a .env file or environment variables.
  • The mcp_config uses an absolute directory for the server; adjust /absolute/path/to/data-forge-mcp to your deployment path.
  • If you upgrade dependencies, verify compatibility with Pandas/Polars, Pandera, PyJanitor, YData Profiling, tsfresh, and DuckDB.
  • For performance, you may enable NUMBA_DISABLE_CUDA=1 and NUMBA_ENABLE_CUDASIM=1 to avoid GPU complexities in environments without GPUs.
  • When running multiple datasets, consider session management and naming conventions for dataset_id references (e.g., df_123).
  • If you encounter network or parsing issues in extract_tables, ensure website access is allowed and that the URL is correct and accessible.

Related MCP Servers

Sponsor this space

Reach thousands of developers