data-commons-search
🔠Search server to access data from various open access data publishers
claude mcp add --transport stdio eosc-data-commons-data-commons-search uvx data-commons-search \ --env OPENSEARCH_URL="OPENSEARCH_URL"
How to use
This MCP server provides a natural language search interface over the EOSC Data Commons OpenSearch-backed datasets. It exposes two HTTP endpoints: /mcp for MCP-driven retrieval and /chat for interactive tool usage with an LLM provider. To use the MCP endpoint, deploy the server and point clients to http://<host>:<port>/mcp; you will need access to a pre-indexed OpenSearch instance. The server supports two main capabilities through MCP: searching datasets and retrieving metadata for the files within a dataset. It does not currently enable searching tools or citations via the MCP endpoint, but it can be extended to do so. For an interactive chat experience with an LLM, use the /chat endpoint by POSTing JSON payloads containing messages and a chosen model. When running locally via STDIO transport, you can connect using the vs-code Copilot-like workflow or via a local socket, as shown in the README samples.
How to install
Prerequisites: Python 3.11+ (or a compatible Python runtime), Git, and network access to install dependencies. Optional: Docker if you prefer containerized deployment.
-
Clone the repository
- git clone https://github.com/EOSC-Data-Commons/data-commons-search.git
- cd data-commons-search
-
Set up a Python environment
- python -m venv venv
- source venv/bin/activate # On Windows use venv\Scripts\activate
-
Install dependencies
- pip install -U pip
- pip install -r requirements.txt # If a requirements file exists in the project
- If using the development workflow shown in the README, ensure you have uv/uvx installed via your preferred method (e.g., pipx install uv, pipx install uvx)
-
Configure environment variables (example required for MCP use)
- OPENSEARCH_URL=http://localhost:9200 # URL to your OpenSearch instance
- Add any API keys required by your LLM provider (e.g., EINFRACZ_API_KEY, MISTRAL_API_KEY, OPENROUTER_API_KEY) if you plan to use /chat
-
Run the development server (STDIO or HTTP as described in the README)
- For STDIO transport via uv/uvx: uvx data-commons-search
- For HTTP transport (development): uv run uvicorn src.data_commons_search.main:app --reload --port 8000
-
Verify the server is running
- Open http://localhost:8000/mcp to test the MCP endpoint
- Optional: test the /chat endpoint with a sample payload
Notes:
- The exact commands may vary based on your environment and the installed tooling (uv, uvx, uvicorn). The README examples show using uv and uvicorn for development and uvx for STDIO transport.
Additional notes
Tips and considerations:
- Ensure OPENSEARCH_URL is reachable from your deployment; the MCP server relies on this for retrieving dataset information.
- If you expose the /chat endpoint, you will typically need an API key for your chosen LLM provider. Store keys securely (e.g., in keys.env) and load them into the container or runtime environment as needed.
- When integrating with VSCode Copilot-style tooling, you can use the STDIO transport configuration shown in the README to connect via the data-commons-search server name and command (uvx data-commons-search).
- If you plan to deploy with Docker, follow the provided docker-compose example in the README and ensure your environment variables (OPENSEARCH_URL, LLM keys, and optional SEARCH_API_KEY) are provided to the container.
- The current MCP capabilities include searching datasets and retrieving dataset file metadata. Tool and citation search features are listed as not yet implemented in this server version.
- If you encounter port conflicts, adjust the SERVER_PORT or run with a different host/port as appropriate.
Related MCP Servers
gpt-researcher
An autonomous agent that conducts deep research on any data using any LLM providers.
ddgs
A metasearch library that aggregates results from diverse web search services
academic-search
Academic Paper Search MCP Server for Claude Desktop integration. Allows Claude to access data from Semantic Scholar and Crossref.
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
mcp-yfinance
Real-time stock API with Python, MCP server example, yfinance stock analysis dashboard
cloudwatch-logs
MCP server from serkanh/cloudwatch-logs-mcp