Get the FREE Ultimate OpenClaw Setup Guide →

bio-binning-qc

Scanned
npx machina-cli add skill fmschulz/omics-skills/bio-binning-qc --openclaw
Files (1)
SKILL.md
2.2 KB

Bio Binning QC

Perform metagenomic binning, refinement, and QC with completeness/contamination checks.

Instructions

  1. Compute depth/coverage per sample.
  2. Run multiple binners (MetaBAT2, SemiBin2, QuickBin).
  3. Classify bins by domain (bacteria/archaea vs eukaryotes).
  4. Run domain-specific QC:
  5. CheckM2 for bacterial and archaeal bins
  6. EukCC for eukaryotic bins
  7. GUNC for contamination detection (all domains).

Quick Reference

TaskAction
Run workflowFollow the steps in this skill and capture outputs.
Validate inputsConfirm required inputs and reference data exist.
Review outputsInspect reports and QC gates before proceeding.
Tool docsSee docs/README.md.
References- See ../bio-skills-references.md

Input Requirements

Prerequisites:

  • Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
  • Reference DB root: set BIO_DB_ROOT (default /media/shared-expansion/db/ on WSU).
  • Coverage/depth tables or reads available to compute coverage. Inputs:
  • contigs.fasta
  • coverage.tsv (per-sample depth table)

Output

  • results/bio-binning-qc/bins/
  • results/bio-binning-qc/bin_metrics.tsv
  • results/bio-binning-qc/bin_qc_report.html
  • results/bio-binning-qc/logs/

Quality Gates

  • Completeness and contamination meet project thresholds.
  • Chimera and contamination flags are below thresholds.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
  • Verify contigs.fasta and coverage.tsv are non-empty.
  • Verify reference DBs for QC tools exist under the reference root.

Examples

Example 1: Expected input layout

contigs.fasta
coverage.tsv (per-sample depth table)

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-binning-qc/SKILL.mdView on GitHub

Overview

Bio Binning QC performs metagenomic binning, refinement, and quality control with completeness and contamination checks. It computes depth per sample, runs multiple binners, and classifies bins by domain before applying tailored QC. The resulting reports help select high-quality bins for downstream analysis.

How This Skill Works

First, compute depth/coverage per sample and prepare inputs. Then run multiple binning tools (MetaBAT2, SemiBin2, QuickBin) to generate bins. Finally, classify bins by domain (bacteria/archaea vs eukaryotes) and run domain-specific QC with CheckM2, EukCC, and GUNC to produce QC reports and visuals.

When to Use It

  • You have a mixed-domain metagenome and need robust, cross-tool bins.
  • You want to screen bins for domain-specific completeness and contamination.
  • You need to compare binning results across different algorithms.
  • You must verify inputs (contigs.fasta and coverage.tsv) and ensure reference databases exist.
  • You need reproducible QC reporting (bin_qc_report.html and bin_metrics.tsv).

Quick Start

  1. Step 1: Compute depth/coverage per sample.
  2. Step 2: Run MetaBAT2, SemiBin2, and QuickBin to generate bins.
  3. Step 3: Classify bins by domain and run domain-specific QC (CheckM2, EukCC, GUNC); review outputs.

Best Practices

  • Validate inputs (contigs.fasta and coverage.tsv) are non-empty.
  • Ensure BIO_DB_ROOT is set and required tools are accessible.
  • Run multiple binners and compare bin assignments.
  • Use domain-aware QC (CheckM2 for bacteria/archaea, EukCC for eukaryotes, GUNC for contamination) before filtering.
  • Inspect the QC report and logs, then retry with parameter tweaks if gates fail.

Example Use Cases

  • Soil microbiome binning with bacterial/archaeal bins verified by CheckM2 and GUNC.
  • Marine sample with notable eukaryotic content analyzed using EukCC.
  • Per-sample depth-informed binning across multiple samples to improve bin recovery.
  • QC-driven filters applied to produce high-quality bins for downstream annotation.
  • Re-running with alternative parameters after QC gates indicate sub-threshold completeness.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers