Get the FREE Ultimate OpenClaw Setup Guide →

bio-stats-ml-reporting

npx machina-cli add skill fmschulz/omics-skills/bio-stats-ml-reporting --openclaw
Files (1)
SKILL.md
1.8 KB

Bio Stats ML Reporting

Aggregate results, train ML models, and produce reports with validated references.

Instructions

  1. Join outputs in DuckDB and build feature tables.
  2. Train baseline models and evaluate with cross-validation.
  3. Generate reports and validate references.

Quick Reference

TaskAction
Run workflowFollow the steps in this skill and capture outputs.
Validate inputsConfirm required inputs and reference data exist.
Review outputsInspect reports and QC gates before proceeding.
Tool docsSee docs/README.md.
References- See ../bio-skills-references.md

Input Requirements

Prerequisites:

  • Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
  • Results tables and metadata are available. Inputs:
  • results/.parquet or results/.tsv
  • metadata.tsv

Output

  • results/bio-stats-ml-reporting/models/
  • results/bio-stats-ml-reporting/metrics.tsv
  • results/bio-stats-ml-reporting/report.md
  • results/bio-stats-ml-reporting/logs/

Quality Gates

  • Model performance sanity checks pass.
  • Reference validation passes.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
  • Verify input tables are readable and schema-consistent.

Examples

Example 1: Expected input layout

results/*.parquet or results/*.tsv
metadata.tsv

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-stats-ml-reporting/SKILL.mdView on GitHub

Overview

Bio Stats ML Reporting aggregates results, trains machine learning models, and produces reports with validated references. It uses DuckDB to join outputs and build feature tables, then trains baseline models with cross-validation before generating a reference-validated report.

How This Skill Works

Inputs are loaded from results/*.parquet or results/*.tsv and metadata.tsv. The workflow joins outputs in DuckDB to build feature tables, then trains baseline models and evaluates them via cross-validation. Finally, it generates a report (report.md and metrics.tsv) and validates references before finalizing artifacts.

When to Use It

  • When you have omics results and need a reproducible ML baseline.
  • When you must join outputs with metadata to form feature tables for modeling.
  • When you require model evaluation with cross-validation and sanity checks.
  • When you want a standardized process to generate a report with validated references.
  • When you need QC gates and troubleshooting guidance to ensure inputs and references are correct.

Quick Start

  1. Step 1: Ensure inputs exist (results/*.parquet/tsv and metadata.tsv) and tools are available per docs.
  2. Step 2: Run the workflow steps: join in DuckDB, build feature tables, train baseline models with cross-validation.
  3. Step 3: Generate report.md and metrics.tsv, validate references, and review QC gates in the outputs folder.

Best Practices

  • Verify inputs exist: results/*.parquet or results/*.tsv and metadata.tsv.
  • Validate input schemas and metadata alignment before joining.
  • Leverage DuckDB for efficient, reproducible feature table creation.
  • Use cross-validation to assess baseline models and capture metrics.
  • Validate references and document QC outcomes in the final report.

Example Use Cases

  • Omics study with transcriptomics results parsed to parquet, feature tables built, baseline models trained, and a report generated with references.
  • Proteomics dataset with metadata alignment; metrics.tsv captured and report.md created.
  • Metabolomics workflow performing cross-validated model evaluation and QC gates.
  • Pilot study using results.tsv for rapid iteration, producing logs and a final report.
  • End-to-end pipeline where references are cross-checked against ../bio-skills-references.md.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers