How do I refresh or obtain a new token?

Login to the KBase JupyterHub, run BERDLSettings().KBASE_AUTH_TOKEN in a notebook, and copy the token value. Tokens expire after ~1 week; refresh by re-authenticating.

What are the output limits and how should I handle results?

Sampling is capped at 100 rows; SQL queries return up to 1000 rows. Use jq to parse JSON and paginate if needed.

How do I override the MCP URL or connect to a different endpoint?

Set KBASE_MCP_URL to the desired MCP REST endpoint. If unset, a default is used.

kbase-query

Scanned

npx machina-cli add skill cmungall/lakehouse-skills/kbase-query --openclaw

Files (1)

SKILL.md

2.8 KB

KBase Query

Query the KBase/BERDL Datalake MCP Server via REST API.

Setup

export KBASE_TOKEN="your_token_here"
export KBASE_MCP_URL="https://hub.berdl.kbase.us/apis/mcp"  # optional default

Getting your token

Login to the KBase JupyterHub
In any notebook, run:
```
BERDLSettings().KBASE_AUTH_TOKEN
```
Copy the token value

Note: Tokens expire after ~1 week. If you get auth errors, refresh your token.

Scripts

All scripts require KBASE_TOKEN env var and jq installed.

Script	Usage
`kbase_health.sh`	Check API health
`kbase_list_databases.sh`	List all databases
`kbase_list_tables.sh <db>`	List tables in database
`kbase_table_schema.sh <db> <table>`	Get table columns
`kbase_db_structure.sh [with_schema]`	Full DB structure
`kbase_table_count.sh <db> <table>`	Row count
`kbase_table_sample.sh <db> <table> [limit]`	Sample rows (max 100)
`kbase_query.sh <sql> [limit]`	Execute SQL (max 1000)
`kbase_select.sh <db> <table> [limit]`	Structured select

Example Workflow

# List databases
kbase_list_databases.sh
# → {"databases": ["kbase_ke_pangenome", "nmdc_core", ...]}

# List tables in pangenome database
kbase_list_tables.sh kbase_ke_pangenome
# → {"tables": ["genome", "gene", "gene_cluster", ...]}

# Get columns for a table (returns names only, not types)
kbase_table_schema.sh kbase_ke_pangenome genome
# → {"columns": ["genome_id", "gtdb_species_clade_id", ...]}

# Sample rows
kbase_table_sample.sh kbase_ke_pangenome genome 5

# SQL query
kbase_query.sh "SELECT * FROM kbase_ke_pangenome.genome LIMIT 10"

Useful jq Patterns

# Extract just database names
kbase_list_databases.sh | jq -r '.databases[]'

# Get columns as comma-separated list
kbase_table_schema.sh kbase_ke_pangenome genome | jq -r '.columns | join(", ")'

# Loop through all tables to get schemas
for t in $(kbase_list_tables.sh kbase_ke_pangenome | jq -r '.tables[]'); do
  echo "=== $t ==="
  kbase_table_schema.sh kbase_ke_pangenome "$t" | jq -r '.columns | join(", ")'
done

Available Databases

Key databases include:

kbase_ke_pangenome - Pangenomic data with GTDB taxonomy
nmdc_core - NMDC microbiome data
kbase_genomes - KBase genome collection
kbase_uniprot_* - UniProt reference data

API Reference

See references/api_reference.md for complete endpoint documentation.

Source

git clone https://github.com/cmungall/lakehouse-skills/blob/main/kbase-query/SKILL.mdView on GitHub

Overview

KBase Query lets you explore the KBase/BERDL Datalake through the MCP REST API. It supports listing databases and tables, retrieving schemas, sampling data, and running SQL queries against the data lake. It authenticates with a KBASE_TOKEN and uses handy kbase_*.sh scripts to perform common tasks.

How This Skill Works

Set the KBASE_TOKEN and optional KBASE_MCP_URL environment variables, then use the provided kbase_*.sh scripts to interact with the MCP REST endpoints. The scripts perform health checks, list databases and tables, fetch schemas, sample data, and execute SQL queries, returning results in JSON which can be parsed with jq.

When to Use It

Exploring what databases and tables exist in the KBase BERDL datalake.
Listing all tables within a specific database.
Retrieving the column names (schema) for a table.
Sampling a few rows to quickly inspect data quality and shape.
Running SQL queries against a database/table to extract insights.

Quick Start

Step 1: export KBASE_TOKEN="your_token_here"; export KBASE_MCP_URL="https://hub.berdl.kbase.us/apis/mcp" # optional default
Step 2: Run a quick health check: kbase_health.sh
Step 3: Try a basic operation, e.g., list databases or run a small SQL: kbase_list_databases.sh or kbase_query.sh "SELECT * FROM kbase_ke_pangenome.genome LIMIT 5"

Best Practices

Keep KBASE_TOKEN secure and refresh it before expiry; tokens expire after ~1 week.
Override the MCP URL with KBASE_MCP_URL when connecting to non-default endpoints.
Check API health with kbase_health.sh before running heavy queries.
Inspect table schemas with kbase_table_schema.sh before joins or heavy analytics.
Respect result limits: sample data is capped at 100 rows; SQL queries have a max of 1000 rows.

Example Use Cases

List databases: kbase_list_databases.sh Output: {"databases": ["kbase_ke_pangenome", "nmdc_core", ...]}
List tables in a database: kbase_list_tables.sh kbase_ke_pangenome Output: {"tables": ["genome", "gene", "gene_cluster", ...]}
Get columns for a table: kbase_table_schema.sh kbase_ke_pangenome genome Output: {"columns": ["genome_id", "gtdb_species_clade_id", ...]}
Sample rows: kbase_table_sample.sh kbase_ke_pangenome genome 5 Output: {"rows": [...]}
SQL query: kbase_query.sh "SELECT * FROM kbase_ke_pangenome.genome LIMIT 10" Output: {"results": [...]}

Frequently Asked Questions

Add this skill to your agents