bio-binning-qc
Scannednpx machina-cli add skill fmschulz/omics-skills/bio-binning-qc --openclawBio Binning QC
Perform metagenomic binning, refinement, and QC with completeness/contamination checks.
Instructions
- Compute depth/coverage per sample.
- Run multiple binners (MetaBAT2, SemiBin2, QuickBin).
- Classify bins by domain (bacteria/archaea vs eukaryotes).
- Run domain-specific QC:
- CheckM2 for bacterial and archaeal bins
- EukCC for eukaryotic bins
- GUNC for contamination detection (all domains).
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Reference DB root: set
BIO_DB_ROOT(default/media/shared-expansion/db/on WSU). - Coverage/depth tables or reads available to compute coverage. Inputs:
- contigs.fasta
- coverage.tsv (per-sample depth table)
Output
- results/bio-binning-qc/bins/
- results/bio-binning-qc/bin_metrics.tsv
- results/bio-binning-qc/bin_qc_report.html
- results/bio-binning-qc/logs/
Quality Gates
- Completeness and contamination meet project thresholds.
- Chimera and contamination flags are below thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify contigs.fasta and coverage.tsv are non-empty.
- Verify reference DBs for QC tools exist under the reference root.
Examples
Example 1: Expected input layout
contigs.fasta
coverage.tsv (per-sample depth table)
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-binning-qc/SKILL.mdView on GitHub Overview
Bio Binning QC performs metagenomic binning, refinement, and quality control with completeness and contamination checks. It computes depth per sample, runs multiple binners, and classifies bins by domain before applying tailored QC. The resulting reports help select high-quality bins for downstream analysis.
How This Skill Works
First, compute depth/coverage per sample and prepare inputs. Then run multiple binning tools (MetaBAT2, SemiBin2, QuickBin) to generate bins. Finally, classify bins by domain (bacteria/archaea vs eukaryotes) and run domain-specific QC with CheckM2, EukCC, and GUNC to produce QC reports and visuals.
When to Use It
- You have a mixed-domain metagenome and need robust, cross-tool bins.
- You want to screen bins for domain-specific completeness and contamination.
- You need to compare binning results across different algorithms.
- You must verify inputs (contigs.fasta and coverage.tsv) and ensure reference databases exist.
- You need reproducible QC reporting (bin_qc_report.html and bin_metrics.tsv).
Quick Start
- Step 1: Compute depth/coverage per sample.
- Step 2: Run MetaBAT2, SemiBin2, and QuickBin to generate bins.
- Step 3: Classify bins by domain and run domain-specific QC (CheckM2, EukCC, GUNC); review outputs.
Best Practices
- Validate inputs (contigs.fasta and coverage.tsv) are non-empty.
- Ensure BIO_DB_ROOT is set and required tools are accessible.
- Run multiple binners and compare bin assignments.
- Use domain-aware QC (CheckM2 for bacteria/archaea, EukCC for eukaryotes, GUNC for contamination) before filtering.
- Inspect the QC report and logs, then retry with parameter tweaks if gates fail.
Example Use Cases
- Soil microbiome binning with bacterial/archaeal bins verified by CheckM2 and GUNC.
- Marine sample with notable eukaryotic content analyzed using EukCC.
- Per-sample depth-informed binning across multiple samples to improve bin recovery.
- QC-driven filters applied to produce high-quality bins for downstream annotation.
- Re-running with alternative parameters after QC gates indicate sub-threshold completeness.