mcp-data

Duckdb-based MCP server for datasets

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio boettiger-lab-mcp-data-server python server.py \
  --env PORT="8000" \
  --env THREADS="100" \
  --env AWS_PROFILE="optional-aws-profile-if-needed" \
  --env S3_ENDPOINT="optional-custom-s3-endpoint"

How to use

This MCP server exposes a SQL-accessible interface to large geospatial datasets stored in S3, powered by DuckDB. It runs in a stateless HTTP mode so each query executes in its own DuckDB instance, providing isolation and security while enabling fast analytical queries against H3-indexed environmental and biodiversity data. Clients interact with the server via the MCP protocol, sending structured requests that include a SQL query and receiving results in a structured response. Typical usage is to connect through an MCP client (e.g., in an LLM toolkit or IDE integration) by pointing to the hosted endpoint or a local instance, then invoking query(sql) to execute analytical SQL against the datasets such as GLWD, WDPA, iNaturalist, and more. The server supports rich dataset discovery through catalog endpoints and dataset prompts to guide query construction, including partition pruning with H3 h0-h8 resolutions and optimized read paths for S3.

How to install

Prerequisites:

Python 3.9+ (recommended)
pip (comes with Python)
Access to S3 buckets hosting the datasets (public or with credentials configured)

Clone the repository or download the project files.
Create a Python virtual environment (optional but recommended): python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Install dependencies: pip install -r requirements.txt
Run the MCP server locally: python server.py
Connect to the local MCP endpoint (HTTP, not HTTPS by default): Example client config: { "servers": { "duckdb-geo": { "url": "http://localhost:8000/mcp" } } }

Notes:

If you plan to deploy to a container or Kubernetes, use the provided deployment manifests and adjust environment variables as needed.
The hosted endpoint can be used by setting the URL to the public MCP endpoint (as shown in the examples).

Additional notes

Tips and notes:

Environment variables: THREADS controls DuckDB parallelism for S3 reads; adjust based on your CPU and network.
PORT defaults to 8000; ensure firewall/security groups allow access as appropriate.
The server is stateless and reads only from public S3 buckets by default; enable credentials if required for private datasets.
For local development, note that working with local resources may be slower than the hosted endpoint due to network and compute differences.
Query optimization tips and dataset catalogs are described in the repository; ensure you review query-setup.md and query-optimization.md for best practices.
If running in Kubernetes, the included manifests deploy two replicas with 16 GB memory per pod and use the UV toolchain for dependency installation.

Related MCP Servers

mcp-vegalite

MCP server from isaacwasserman/mcp-vegalite-server

github-chat

A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.

nautex

MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline

pagerduty

PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.

futu-stock

mcp server for futuniuniu stock

mcp -boilerplate

Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP