data-forge
Data-Forge is a Model Context Protocol (MCP) server that transforms any LLM into a powerful Data Science Assistant. It provides a suite of high-performance tools for data loading, cleaning, validation, profiling, feature engineering, and visualization.
claude mcp add --transport stdio angrysky56-data-forge-mcp uv --directory /absolute/path/to/data-forge-mcp run -m src.server \ --env LOG_LEVEL="WARNING" \ --env PYTHONUNBUFFERED="1" \ --env NUMBA_DISABLE_CUDA="1" \ --env NUMBA_ENABLE_CUDASIM="1"
How to use
Data-Forge is an MCP server that equips LLMs with a comprehensive data science toolkit. It provides core data management capabilities (loading datasets, inspecting schemas, and managing multiple loaded datasets), quality gates (validation with Pandera and cleaning with PyJanitor), profiling and insights (detailed data reports via YData Profiling), visualization (chart generation using Seaborn/Matplotlib), feature engineering (tsfresh feature extraction), acquisition (loading datasets from HuggingFace and scraping tables from web pages), visualization aids (geospatial maps with GeoPandas and an interactive D-Tale explorer), discovery probes (Topological Data Analysis for identifying semantic holes), and powerful headless SQL control through DuckDB. To use it, configure the MCP client to point to the Data-Forge MCP server, then invoke tools like load_data, get_dataset_profile, extract_tables, generate_chart, run_sql_query, and more. Each tool targets a specific data operation, enabling the LLM to orchestrate end-to-end data workflows across multiple datasets in a session.
How to install
Prerequisites:
- Python 3.13+
- uv (recommended) or pip
Installation steps:
- Clone the repository:
git clone https://github.com/angrysky56/data-forge-mcp.git
cd data-forge-mcp
- Install and run the MCP server using uv:
uv sync
- (Optional) Configure environment for private Hugging Face datasets:
# create a .env file in the project root or set env vars directly
HF_TOKEN=your_hugging_face_token
- Start or deploy as needed (see mcp_config) and ensure the path in the config matches your deployment.
Note: The README specifies using uv with the following invocation in the MCP config; adapt the absolute path to your environment accordingly.
Additional notes
Tips and considerations:
- Ensure Python 3.13+ is installed and accessible in PATH.
- If accessing private Hugging Face datasets, provide HF_TOKEN via a .env file or environment variables.
- The mcp_config uses an absolute directory for the server; adjust /absolute/path/to/data-forge-mcp to your deployment path.
- If you upgrade dependencies, verify compatibility with Pandas/Polars, Pandera, PyJanitor, YData Profiling, tsfresh, and DuckDB.
- For performance, you may enable NUMBA_DISABLE_CUDA=1 and NUMBA_ENABLE_CUDASIM=1 to avoid GPU complexities in environments without GPUs.
- When running multiple datasets, consider session management and naming conventions for dataset_id references (e.g., df_123).
- If you encounter network or parsing issues in extract_tables, ensure website access is allowed and that the URL is correct and accessible.
Related MCP Servers
Wax
Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File. Pure Swift
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
mcp-yfinance
Real-time stock API with Python, MCP server example, yfinance stock analysis dashboard
mcp_autogen_sse_stdio
This repository demonstrates how to use AutoGen to integrate local and remote MCP (Model Context Protocol) servers. It showcases a local math tool (math_server.py) using Stdio and a remote Apify tool (RAG Web Browser Actor) via SSE for tasks like arithmetic and web browsing.
cloudwatch-logs
MCP server from serkanh/cloudwatch-logs-mcp
servicenow-api
ServiceNow MCP Server and API Wrapper