Get the FREE Ultimate OpenClaw Setup Guide →

jgi-lakehouse

Use Caution
npx machina-cli add skill cmungall/lakehouse-skills/jgi-lakehouse --openclaw
Files (1)
SKILL.md
3.7 KB

JGI Lakehouse Query

Query the JGI Dremio Lakehouse via CLI.

Setup

pip install linkml-store[dremio]
export DREMIO_USER="your_username"
export DREMIO_PASSWORD="your_password"
export CF_AUTHORIZATION="your_token"

Getting the CF_AUTHORIZATION token

The CF_AUTHORIZATION token is a Cloudflare Access cookie required for authentication:

  • Open https://lakehouse.jgi.lbl.gov/ in your browser
  • Open Developer Tools (F12) > Application > Cookies
  • Copy the value of the CF_Authorization cookie

Note that unless you have computer control you will have to ask your human to do this

Connection

# Base connection (no schema filter)
DB="dremio-rest://lakehouse.jgi.lbl.gov"

# GOLD database
GOLD="dremio-rest://lakehouse.jgi.lbl.gov?schema=gold-db-2 postgresql.gold"

# IMG Core database
IMG="dremio-rest://lakehouse.jgi.lbl.gov?schema=img-db-2 postgresql.img_core_v400"

SQL Queries

Direct SQL via --sql flag:

# Query GOLD studies
linkml-store -d "$DB" query --sql 'SELECT * FROM "gold-db-2 postgresql".gold.study LIMIT 10'

# Count by ecosystem
linkml-store -d "$DB" query --sql 'SELECT ecosystem, COUNT(*) as cnt FROM "gold-db-2 postgresql".gold.study GROUP BY ecosystem ORDER BY cnt DESC'

# Join tables
linkml-store -d "$DB" query --sql 'SELECT s.study_name, p.project_name FROM "gold-db-2 postgresql".gold.study s JOIN "gold-db-2 postgresql".gold.project p ON s.study_id = p.study_id LIMIT 10'

# Output formats
linkml-store -d "$DB" query --sql '...' -O json
linkml-store -d "$DB" query --sql '...' -O csv
linkml-store -d "$DB" query --sql '...' -O yaml

Collection Queries

When schema is set, use collection-based queries:

# List tables
linkml-store -d "$GOLD" list-collections

# Query collection
linkml-store -d "$GOLD" -c study query --limit 10

# With filter
linkml-store -d "$GOLD" -c study query -w "ecosystem=Host-associated" --limit 20

# Describe table
linkml-store -d "$GOLD" -c study describe

Schema Export

# Export LinkML schema with FK relationships
linkml-store -d "$GOLD" schema -O yaml -o gold.linkml.yaml

Key Databases

Schema PathTablesDescription
"gold-db-2 postgresql".gold42GOLD - genome project metadata
"img-db-2 postgresql".img_core_v400244IMG Core - genes, taxa, annotations
"img-db-2 postgresql".img_ext84IMG Extended data
"img-db-2 postgresql".img_sat_v450141IMG Satellite/experimental

See references/databases.md for complete listing.

Common SQL Patterns

# List schemas
linkml-store -d "$DB" query --sql 'SELECT TABLE_SCHEMA, COUNT(*) as tables FROM INFORMATION_SCHEMA."TABLES" GROUP BY TABLE_SCHEMA ORDER BY TABLE_SCHEMA'

# List tables in schema
linkml-store -d "$DB" query --sql 'SELECT TABLE_NAME FROM INFORMATION_SCHEMA."TABLES" WHERE TABLE_SCHEMA = '\''gold-db-2 postgresql.gold'\'' ORDER BY TABLE_NAME'

# Table columns
linkml-store -d "$DB" query --sql 'SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA."COLUMNS" WHERE TABLE_SCHEMA = '\''gold-db-2 postgresql.gold'\'' AND TABLE_NAME = '\''study'\'' ORDER BY ORDINAL_POSITION'

# Row count
linkml-store -d "$DB" query --sql 'SELECT COUNT(*) FROM "gold-db-2 postgresql".gold.study'

Pre-generated Schemas

LinkML schemas with FK relationships: /Users/cjm/repos/linkml-store/jgi-lakehouse-analysis/

Source

git clone https://github.com/cmungall/lakehouse-skills/blob/main/jgi-lakehouse/SKILL.mdView on GitHub

Overview

Use the JGI Lakehouse skill to connect to the Dremio-based GOLD and IMG genomics databases, run SQL queries, browse collections, and export schema metadata. It helps researchers explore JGI genomic data, retrieve study metadata, and join GOLD and IMG datasets from the command line.

How This Skill Works

Install the Python client and set credentials (DREMIO_USER, DREMIO_PASSWORD, CF_AUTHORIZATION). Define DB connections for GOLD and IMG, then run SQL or collection queries with linkml-store (for ad hoc SQL or collection-based workflows).

When to Use It

  • Explore JGI databases and data catalogs by listing collections and schemas.
  • Query GOLD study tables to retrieve metadata, counts, or study details.
  • Perform cross-database queries or ecosystem analysis across GOLD and IMG.
  • Describe tables or export a LinkML schema with FK relationships.
  • Export results in JSON, CSV, or YAML for downstream analysis.

Quick Start

  1. Step 1: Install the client and set credentials: pip install linkml-store[dremio] and export DREMIO_USER, DREMIO_PASSWORD, CF_AUTHORIZATION.
  2. Step 2: Define connections: set DB, GOLD, and IMG variables as shown in the SKILL (base, GOLD, IMG connection strings).
  3. Step 3: Run a sample query: linkml-store -d "$DB" query --sql 'SELECT * FROM "gold-db-2 postgresql".gold.study LIMIT 10'

Best Practices

  • Start with list-collections to discover available schemas and tables.
  • Use fully-qualified tables like 'gold-db-2 postgresql'.gold.study to avoid ambiguity.
  • Choose an output format (-O json/csv/yaml) based on downstream needs.
  • Prefer collection queries (-c) after selecting a schema to optimize queries.
  • Securely manage CF_AUTHORIZATION and credentials; rotate tokens regularly.

Example Use Cases

  • Query GOLD studies: linkml-store -d "$DB" query --sql 'SELECT * FROM "gold-db-2 postgresql".gold.study LIMIT 10'
  • Count by ecosystem: linkml-store -d "$DB" query --sql 'SELECT ecosystem, COUNT(*) as cnt FROM "gold-db-2 postgresql".gold.study GROUP BY ecosystem ORDER BY cnt DESC'
  • Join tables: linkml-store -d "$DB" query --sql 'SELECT s.study_name, p.project_name FROM "gold-db-2 postgresql".gold.study s JOIN "gold-db-2 postgresql".gold.project p ON s.study_id = p.study_id LIMIT 10'
  • Output formats: linkml-store -d "$DB" query --sql 'SELECT * FROM ...' -O json
  • List GOLD collections: linkml-store -d "$GOLD" list-collections

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers