Get the FREE Ultimate OpenClaw Setup Guide →

bio-foundation-housekeeping

Scanned
npx machina-cli add skill fmschulz/omics-skills/bio-foundation-housekeeping --openclaw
Files (1)
SKILL.md
2.2 KB

Bio Foundation Housekeeping

Initialize a bioinformatics project scaffold with reproducible environments, schemas, and data cataloging. Use for new projects or repo setup.

Instructions

  1. Create standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/).
  2. Initialize Pixi workspace and lockfile; define tasks.
  3. Define LinkML schemas and generate Pydantic models.
  4. Create DuckDB catalog and register parquet tables.

Quick Reference

TaskAction
Run workflowFollow the steps in this skill and capture outputs.
Validate inputsConfirm required inputs and reference data exist.
Review outputsInspect reports and QC gates before proceeding.
Tool docsSee docs/README.md.
References- See ../bio-skills-references.md

Input Requirements

Prerequisites:

  • Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
  • Target project root is writable. Inputs:
  • project root (path)
  • metadata schema requirements
  • workflow engine preference (optional)

Output

  • pixi.toml
  • pixi.lock
  • schemas/
  • data/catalog.duckdb
  • results/bio-foundation-housekeeping/report.md
  • results/bio-foundation-housekeeping/logs/

Quality Gates

  • Schema generation succeeds and models are importable.
  • pixi.lock is created and consistent with pixi.toml.
  • DuckDB catalog is readable.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
  • Verify project root exists and is writable.
  • Validate generated schemas against expected fields.

Examples

Example 1: Expected input layout

project root (path)
metadata schema requirements
workflow engine preference (optional)

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-foundation-housekeeping/SKILL.mdView on GitHub

Overview

Sets up a standard bioinformatics project scaffold with reproducible environments, schemas, and a data catalog. It creates a standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/), initializes a Pixi workspace and lockfile, defines LinkML schemas and generates Pydantic models, and builds a DuckDB catalog for parquet data.

How This Skill Works

The workflow creates the standard directory layout, initializes a Pixi workspace and lockfile with defined tasks, defines LinkML schemas and generates Pydantic models, then creates a DuckDB catalog and registers parquet tables for data access.

When to Use It

  • Starting a new bioinformatics project from scratch.
  • Re-scaffolding an existing repo to enforce a standard workflow layout.
  • Preparing reproducible environments for collaboration across teams.
  • Establishing a data catalog for parquet data and schema-driven access.
  • On-boarding new team members with a ready-to-run project baseline.

Quick Start

  1. Step 1: Create the standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/) and prepare inputs.
  2. Step 2: Initialize a Pixi workspace and lockfile and define the project tasks.
  3. Step 3: Define LinkML schemas, generate Pydantic models, and create a DuckDB catalog registering parquet tables.

Best Practices

  • Use the standard directory layout as the baseline structure for all new projects.
  • Lock environments with pixi.toml and pixi.lock to ensure reproducibility.
  • Define LinkML schemas early and generate corresponding Pydantic models for data validation.
  • Create and validate the DuckDB catalog, registering parquet tables for efficient querying.
  • Keep docs, QC gates, and outputs updated; run checks before proceeding to downstream steps.

Example Use Cases

  • Example 1: Launch a new RNA-seq QC project and generate the full scaffold (data/, results/, schemas/, workflows/, src/, notebooks/).
  • Example 2: Re-scaffold an existing repository to enforce a standard Pixi workspace and reproducible environments.
  • Example 3: Onboard a new collaborator by providing a ready-to-run project layout with LinkML schemas and a DuckDB catalog.
  • Example 4: Initialize a data catalog for parquet datasets and ensure schemas are importable.
  • Example 5: Validate QC gates by inspecting reports and re-running steps with updated parameters.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers