bio-stats-ml-reporting
npx machina-cli add skill fmschulz/omics-skills/bio-stats-ml-reporting --openclawBio Stats ML Reporting
Aggregate results, train ML models, and produce reports with validated references.
Instructions
- Join outputs in DuckDB and build feature tables.
- Train baseline models and evaluate with cross-validation.
- Generate reports and validate references.
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Results tables and metadata are available. Inputs:
- results/.parquet or results/.tsv
- metadata.tsv
Output
- results/bio-stats-ml-reporting/models/
- results/bio-stats-ml-reporting/metrics.tsv
- results/bio-stats-ml-reporting/report.md
- results/bio-stats-ml-reporting/logs/
Quality Gates
- Model performance sanity checks pass.
- Reference validation passes.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify input tables are readable and schema-consistent.
Examples
Example 1: Expected input layout
results/*.parquet or results/*.tsv
metadata.tsv
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-stats-ml-reporting/SKILL.mdView on GitHub Overview
Bio Stats ML Reporting aggregates results, trains machine learning models, and produces reports with validated references. It uses DuckDB to join outputs and build feature tables, then trains baseline models with cross-validation before generating a reference-validated report.
How This Skill Works
Inputs are loaded from results/*.parquet or results/*.tsv and metadata.tsv. The workflow joins outputs in DuckDB to build feature tables, then trains baseline models and evaluates them via cross-validation. Finally, it generates a report (report.md and metrics.tsv) and validates references before finalizing artifacts.
When to Use It
- When you have omics results and need a reproducible ML baseline.
- When you must join outputs with metadata to form feature tables for modeling.
- When you require model evaluation with cross-validation and sanity checks.
- When you want a standardized process to generate a report with validated references.
- When you need QC gates and troubleshooting guidance to ensure inputs and references are correct.
Quick Start
- Step 1: Ensure inputs exist (results/*.parquet/tsv and metadata.tsv) and tools are available per docs.
- Step 2: Run the workflow steps: join in DuckDB, build feature tables, train baseline models with cross-validation.
- Step 3: Generate report.md and metrics.tsv, validate references, and review QC gates in the outputs folder.
Best Practices
- Verify inputs exist: results/*.parquet or results/*.tsv and metadata.tsv.
- Validate input schemas and metadata alignment before joining.
- Leverage DuckDB for efficient, reproducible feature table creation.
- Use cross-validation to assess baseline models and capture metrics.
- Validate references and document QC outcomes in the final report.
Example Use Cases
- Omics study with transcriptomics results parsed to parquet, feature tables built, baseline models trained, and a report generated with references.
- Proteomics dataset with metadata alignment; metrics.tsv captured and report.md created.
- Metabolomics workflow performing cross-validated model evaluation and QC gates.
- Pilot study using results.tsv for rapid iteration, producing logs and a final report.
- End-to-end pipeline where references are cross-checked against ../bio-skills-references.md.