What happens if inputs are missing?

Verify required inputs (project root path, metadata schema requirements) and confirm whether a workflow engine preference is provided if needed.

What outputs are produced?

pixi.toml, pixi.lock, schemas/, data/catalog.duckdb, and results/bio-foundation-housekeeping/report.md plus results/bio-foundation-housekeeping/logs/.

How do I extend or troubleshoot the scaffold?

If issues arise (missing inputs, failed QC), verify paths and permissions, re-run with adjusted parameters, and review reports to guide parameter changes.

bio-foundation-housekeeping

Scanned

npx machina-cli add skill fmschulz/omics-skills/bio-foundation-housekeeping --openclaw

Files (1)

SKILL.md

2.2 KB

Bio Foundation Housekeeping

Initialize a bioinformatics project scaffold with reproducible environments, schemas, and data cataloging. Use for new projects or repo setup.

Instructions

Create standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/).
Initialize Pixi workspace and lockfile; define tasks.
Define LinkML schemas and generate Pydantic models.
Create DuckDB catalog and register parquet tables.

Quick Reference

Task	Action
Run workflow	Follow the steps in this skill and capture outputs.
Validate inputs	Confirm required inputs and reference data exist.
Review outputs	Inspect reports and QC gates before proceeding.
Tool docs	See `docs/README.md`.
References	- See ../bio-skills-references.md

Input Requirements

Prerequisites:

Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
Target project root is writable. Inputs:
project root (path)
metadata schema requirements
workflow engine preference (optional)

Output

pixi.toml
pixi.lock
schemas/
data/catalog.duckdb
results/bio-foundation-housekeeping/report.md
results/bio-foundation-housekeeping/logs/

Quality Gates

Schema generation succeeds and models are importable.
pixi.lock is created and consistent with pixi.toml.
DuckDB catalog is readable.
On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
Verify project root exists and is writable.
Validate generated schemas against expected fields.

Examples

Example 1: Expected input layout

project root (path)
metadata schema requirements
workflow engine preference (optional)

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-foundation-housekeeping/SKILL.mdView on GitHub

Overview

Sets up a standard bioinformatics project scaffold with reproducible environments, schemas, and a data catalog. It creates a standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/), initializes a Pixi workspace and lockfile, defines LinkML schemas and generates Pydantic models, and builds a DuckDB catalog for parquet data.

How This Skill Works

The workflow creates the standard directory layout, initializes a Pixi workspace and lockfile with defined tasks, defines LinkML schemas and generates Pydantic models, then creates a DuckDB catalog and registers parquet tables for data access.

When to Use It

Starting a new bioinformatics project from scratch.
Re-scaffolding an existing repo to enforce a standard workflow layout.
Preparing reproducible environments for collaboration across teams.
Establishing a data catalog for parquet data and schema-driven access.
On-boarding new team members with a ready-to-run project baseline.

Quick Start

Step 1: Create the standard directory layout (data/, results/, schemas/, workflows/, src/, notebooks/) and prepare inputs.
Step 2: Initialize a Pixi workspace and lockfile and define the project tasks.
Step 3: Define LinkML schemas, generate Pydantic models, and create a DuckDB catalog registering parquet tables.

Best Practices

Use the standard directory layout as the baseline structure for all new projects.
Lock environments with pixi.toml and pixi.lock to ensure reproducibility.
Define LinkML schemas early and generate corresponding Pydantic models for data validation.
Create and validate the DuckDB catalog, registering parquet tables for efficient querying.
Keep docs, QC gates, and outputs updated; run checks before proceeding to downstream steps.

Example Use Cases

Example 1: Launch a new RNA-seq QC project and generate the full scaffold (data/, results/, schemas/, workflows/, src/, notebooks/).
Example 2: Re-scaffold an existing repository to enforce a standard Pixi workspace and reproducible environments.
Example 3: Onboard a new collaborator by providing a ready-to-run project layout with LinkML schemas and a DuckDB catalog.
Example 4: Initialize a data catalog for parquet datasets and ensure schemas are importable.
Example 5: Validate QC gates by inspecting reports and re-running steps with updated parameters.

Frequently Asked Questions

Add this skill to your agents