Get the FREE Ultimate OpenClaw Setup Guide →

kbase-query

Scanned
npx machina-cli add skill cmungall/lakehouse-skills/kbase-query --openclaw
Files (1)
SKILL.md
2.8 KB

KBase Query

Query the KBase/BERDL Datalake MCP Server via REST API.

Setup

export KBASE_TOKEN="your_token_here"
export KBASE_MCP_URL="https://hub.berdl.kbase.us/apis/mcp"  # optional default

Getting your token

  1. Login to the KBase JupyterHub
  2. In any notebook, run:
    BERDLSettings().KBASE_AUTH_TOKEN
    
  3. Copy the token value

Note: Tokens expire after ~1 week. If you get auth errors, refresh your token.

Scripts

All scripts require KBASE_TOKEN env var and jq installed.

ScriptUsage
kbase_health.shCheck API health
kbase_list_databases.shList all databases
kbase_list_tables.sh <db>List tables in database
kbase_table_schema.sh <db> <table>Get table columns
kbase_db_structure.sh [with_schema]Full DB structure
kbase_table_count.sh <db> <table>Row count
kbase_table_sample.sh <db> <table> [limit]Sample rows (max 100)
kbase_query.sh <sql> [limit]Execute SQL (max 1000)
kbase_select.sh <db> <table> [limit]Structured select

Example Workflow

# List databases
kbase_list_databases.sh
# → {"databases": ["kbase_ke_pangenome", "nmdc_core", ...]}

# List tables in pangenome database
kbase_list_tables.sh kbase_ke_pangenome
# → {"tables": ["genome", "gene", "gene_cluster", ...]}

# Get columns for a table (returns names only, not types)
kbase_table_schema.sh kbase_ke_pangenome genome
# → {"columns": ["genome_id", "gtdb_species_clade_id", ...]}

# Sample rows
kbase_table_sample.sh kbase_ke_pangenome genome 5

# SQL query
kbase_query.sh "SELECT * FROM kbase_ke_pangenome.genome LIMIT 10"

Useful jq Patterns

# Extract just database names
kbase_list_databases.sh | jq -r '.databases[]'

# Get columns as comma-separated list
kbase_table_schema.sh kbase_ke_pangenome genome | jq -r '.columns | join(", ")'

# Loop through all tables to get schemas
for t in $(kbase_list_tables.sh kbase_ke_pangenome | jq -r '.tables[]'); do
  echo "=== $t ==="
  kbase_table_schema.sh kbase_ke_pangenome "$t" | jq -r '.columns | join(", ")'
done

Available Databases

Key databases include:

  • kbase_ke_pangenome - Pangenomic data with GTDB taxonomy
  • nmdc_core - NMDC microbiome data
  • kbase_genomes - KBase genome collection
  • kbase_uniprot_* - UniProt reference data

API Reference

See references/api_reference.md for complete endpoint documentation.

Source

git clone https://github.com/cmungall/lakehouse-skills/blob/main/kbase-query/SKILL.mdView on GitHub

Overview

KBase Query lets you explore the KBase/BERDL Datalake through the MCP REST API. It supports listing databases and tables, retrieving schemas, sampling data, and running SQL queries against the data lake. It authenticates with a KBASE_TOKEN and uses handy kbase_*.sh scripts to perform common tasks.

How This Skill Works

Set the KBASE_TOKEN and optional KBASE_MCP_URL environment variables, then use the provided kbase_*.sh scripts to interact with the MCP REST endpoints. The scripts perform health checks, list databases and tables, fetch schemas, sample data, and execute SQL queries, returning results in JSON which can be parsed with jq.

When to Use It

  • Exploring what databases and tables exist in the KBase BERDL datalake.
  • Listing all tables within a specific database.
  • Retrieving the column names (schema) for a table.
  • Sampling a few rows to quickly inspect data quality and shape.
  • Running SQL queries against a database/table to extract insights.

Quick Start

  1. Step 1: export KBASE_TOKEN="your_token_here"; export KBASE_MCP_URL="https://hub.berdl.kbase.us/apis/mcp" # optional default
  2. Step 2: Run a quick health check: kbase_health.sh
  3. Step 3: Try a basic operation, e.g., list databases or run a small SQL: kbase_list_databases.sh or kbase_query.sh "SELECT * FROM kbase_ke_pangenome.genome LIMIT 5"

Best Practices

  • Keep KBASE_TOKEN secure and refresh it before expiry; tokens expire after ~1 week.
  • Override the MCP URL with KBASE_MCP_URL when connecting to non-default endpoints.
  • Check API health with kbase_health.sh before running heavy queries.
  • Inspect table schemas with kbase_table_schema.sh before joins or heavy analytics.
  • Respect result limits: sample data is capped at 100 rows; SQL queries have a max of 1000 rows.

Example Use Cases

  • List databases: kbase_list_databases.sh Output: {"databases": ["kbase_ke_pangenome", "nmdc_core", ...]}
  • List tables in a database: kbase_list_tables.sh kbase_ke_pangenome Output: {"tables": ["genome", "gene", "gene_cluster", ...]}
  • Get columns for a table: kbase_table_schema.sh kbase_ke_pangenome genome Output: {"columns": ["genome_id", "gtdb_species_clade_id", ...]}
  • Sample rows: kbase_table_sample.sh kbase_ke_pangenome genome 5 Output: {"rows": [...]}
  • SQL query: kbase_query.sh "SELECT * FROM kbase_ke_pangenome.genome LIMIT 10" Output: {"results": [...]}

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers