bio-assembly-qc
npx machina-cli add skill fmschulz/omics-skills/bio-assembly-qc --openclawBio Assembly QC
Assemble genomes/metagenomes and produce assembly QC artifacts.
Instructions
- Select assembler based on read type and genome size.
- Run assembly with resource-aware settings.
- Run QUAST/MetaQUAST and summarize metrics.
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Sufficient disk and RAM for chosen assembler. Inputs:
- reads/*.fastq.gz (raw reads).
- assembler choice (spades | flye).
Output
- results/bio-assembly-qc/contigs.fasta
- results/bio-assembly-qc/assembly_metrics.tsv
- results/bio-assembly-qc/qc_report.html
- results/bio-assembly-qc/logs/
Quality Gates
- Assembly size range and N50 distribution meet project thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify reads are present and gzip-readable.
- Check available disk space before assembly.
Examples
Example 1: Expected input layout
reads/*.fastq.gz (raw reads).
assembler choice (spades | flye).
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-assembly-qc/SKILL.mdView on GitHub Overview
This skill guides assembling genomes or metagenomes and producing QC artifacts. It emphasizes selecting an assembler based on read type and genome size, running with resource-aware settings, and generating QC metrics via QUAST/MetaQUAST.
How This Skill Works
Verify inputs and choose an appropriate assembler: SPAdes for short reads or Flye for long reads/metagenomes. Run the assembler with resource-aware settings, then execute QUAST or MetaQUAST to generate a summary of metrics and a QC report (qc_report.html) alongside contigs and logs.
When to Use It
- Starting a genome or metagenome assembly and needing QC artifacts
- Choosing between SPAdes (short reads) and Flye (long reads/metagenomes)
- Verifying inputs and ensuring enough disk space and RAM before assembly
- Generating QUAST/MetaQUAST reports to gate quality
- Re-running with adjusted parameters after QC gates fail
Quick Start
- Step 1: Validate inputs (reads/*.fastq.gz) and available tools (Pixi/conda/system).
- Step 2: Choose SPAdes for short reads or Flye for long reads/metagenomes; ensure resources.
- Step 3: Run the assembler, then run QUAST/MetaQUAST and collect qc_report.html and metrics.
Best Practices
- Validate inputs exist and are gzip-readable
- Select assembler based on read type and genome size (SPAdes for short reads, Flye for long reads/metagenomes)
- Ensure sufficient disk and RAM; monitor resource usage during assembly
- Run QUAST/MetaQUAST and review metrics (size, N50, contig counts) before proceeding
- Check QC gates and retry with parameter tweaks if needed
Example Use Cases
- Assembling a bacterial genome from short reads with SPAdes and generating qc_report.html
- Metagenome assembly from mixed reads using Flye and validating with MetaQUAST
- Comparing assembly metrics across parameter sets and recording results in reports
- Validating that reads are present and gzipped before launch
- Troubleshooting QC failures by adjusting assembler parameters and re-running