bio-foundation-housekeeping
Scannednpx machina-cli add skill fmschulz/omics-skills/bio-foundation-housekeeping --openclawBio Foundation Housekeeping
Initialize a bioinformatics project scaffold with reproducible environments, schemas, and data cataloging. Use for new projects or repo setup.
Instructions
- Create standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/).
- Initialize Pixi workspace and lockfile; define tasks.
- Define LinkML schemas and generate Pydantic models.
- Create DuckDB catalog and register parquet tables.
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Target project root is writable. Inputs:
- project root (path)
- metadata schema requirements
- workflow engine preference (optional)
Output
- pixi.toml
- pixi.lock
- schemas/
- data/catalog.duckdb
- results/bio-foundation-housekeeping/report.md
- results/bio-foundation-housekeeping/logs/
Quality Gates
- Schema generation succeeds and models are importable.
- pixi.lock is created and consistent with pixi.toml.
- DuckDB catalog is readable.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify project root exists and is writable.
- Validate generated schemas against expected fields.
Examples
Example 1: Expected input layout
project root (path)
metadata schema requirements
workflow engine preference (optional)
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-foundation-housekeeping/SKILL.mdView on GitHub Overview
Sets up a standard bioinformatics project scaffold with reproducible environments, schemas, and a data catalog. It creates a standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/), initializes a Pixi workspace and lockfile, defines LinkML schemas and generates Pydantic models, and builds a DuckDB catalog for parquet data.
How This Skill Works
The workflow creates the standard directory layout, initializes a Pixi workspace and lockfile with defined tasks, defines LinkML schemas and generates Pydantic models, then creates a DuckDB catalog and registers parquet tables for data access.
When to Use It
- Starting a new bioinformatics project from scratch.
- Re-scaffolding an existing repo to enforce a standard workflow layout.
- Preparing reproducible environments for collaboration across teams.
- Establishing a data catalog for parquet data and schema-driven access.
- On-boarding new team members with a ready-to-run project baseline.
Quick Start
- Step 1: Create the standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/) and prepare inputs.
- Step 2: Initialize a Pixi workspace and lockfile and define the project tasks.
- Step 3: Define LinkML schemas, generate Pydantic models, and create a DuckDB catalog registering parquet tables.
Best Practices
- Use the standard directory layout as the baseline structure for all new projects.
- Lock environments with pixi.toml and pixi.lock to ensure reproducibility.
- Define LinkML schemas early and generate corresponding Pydantic models for data validation.
- Create and validate the DuckDB catalog, registering parquet tables for efficient querying.
- Keep docs, QC gates, and outputs updated; run checks before proceeding to downstream steps.
Example Use Cases
- Example 1: Launch a new RNA-seq QC project and generate the full scaffold (data/, results/, schemas/, workflows/, src/, notebooks/).
- Example 2: Re-scaffold an existing repository to enforce a standard Pixi workspace and reproducible environments.
- Example 3: Onboard a new collaborator by providing a ready-to-run project layout with LinkML schemas and a DuckDB catalog.
- Example 4: Initialize a data catalog for parquet datasets and ensure schemas are importable.
- Example 5: Validate QC gates by inspecting reports and re-running steps with updated parameters.