What inputs are required to run this skill?

Results files in results/*.parquet or results/*.tsv and a metadata.tsv, plus tools in the active environment (Pixi/conda/system).

Where are the outputs stored?

Under results/bio-stats-ml-reporting with models/, metrics.tsv, report.md, and logs/ as described in the Output section.

What happens if a quality gate fails?

Retry with alternative parameters; if still failing, record in the report and exit non-zero per the Quality Gates guidance.

bio-stats-ml-reporting

npx machina-cli add skill fmschulz/omics-skills/bio-stats-ml-reporting --openclaw

Files (1)

SKILL.md

1.8 KB

Bio Stats ML Reporting

Aggregate results, train ML models, and produce reports with validated references.

Instructions

Join outputs in DuckDB and build feature tables.
Train baseline models and evaluate with cross-validation.
Generate reports and validate references.

Quick Reference

Task	Action
Run workflow	Follow the steps in this skill and capture outputs.
Validate inputs	Confirm required inputs and reference data exist.
Review outputs	Inspect reports and QC gates before proceeding.
Tool docs	See `docs/README.md`.
References	- See ../bio-skills-references.md

Input Requirements

Prerequisites:

Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
Results tables and metadata are available. Inputs:
results/.parquet or results/.tsv
metadata.tsv

Output

results/bio-stats-ml-reporting/models/
results/bio-stats-ml-reporting/metrics.tsv
results/bio-stats-ml-reporting/report.md
results/bio-stats-ml-reporting/logs/

Quality Gates

Model performance sanity checks pass.
Reference validation passes.
On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
Verify input tables are readable and schema-consistent.

Examples

Example 1: Expected input layout

results/*.parquet or results/*.tsv
metadata.tsv

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-stats-ml-reporting/SKILL.mdView on GitHub

Overview

Bio Stats ML Reporting aggregates results, trains machine learning models, and produces reports with validated references. It uses DuckDB to join outputs and build feature tables, then trains baseline models with cross-validation before generating a reference-validated report.

How This Skill Works

Inputs are loaded from results/*.parquet or results/*.tsv and metadata.tsv. The workflow joins outputs in DuckDB to build feature tables, then trains baseline models and evaluates them via cross-validation. Finally, it generates a report (report.md and metrics.tsv) and validates references before finalizing artifacts.

When to Use It

When you have omics results and need a reproducible ML baseline.
When you must join outputs with metadata to form feature tables for modeling.
When you require model evaluation with cross-validation and sanity checks.
When you want a standardized process to generate a report with validated references.
When you need QC gates and troubleshooting guidance to ensure inputs and references are correct.

Quick Start

Step 1: Ensure inputs exist (results/*.parquet/tsv and metadata.tsv) and tools are available per docs.
Step 2: Run the workflow steps: join in DuckDB, build feature tables, train baseline models with cross-validation.
Step 3: Generate report.md and metrics.tsv, validate references, and review QC gates in the outputs folder.

Best Practices

Verify inputs exist: results/*.parquet or results/*.tsv and metadata.tsv.
Validate input schemas and metadata alignment before joining.
Leverage DuckDB for efficient, reproducible feature table creation.
Use cross-validation to assess baseline models and capture metrics.
Validate references and document QC outcomes in the final report.

Example Use Cases

Omics study with transcriptomics results parsed to parquet, feature tables built, baseline models trained, and a report generated with references.
Proteomics dataset with metadata alignment; metrics.tsv captured and report.md created.
Metabolomics workflow performing cross-validated model evaluation and QC gates.
Pilot study using results.tsv for rapid iteration, producing logs and a final report.
End-to-end pipeline where references are cross-checked against ../bio-skills-references.md.

Frequently Asked Questions

Add this skill to your agents