bio-reads-qc-mapping
npx machina-cli add skill fmschulz/omics-skills/bio-reads-qc-mapping --openclawBio Reads QC Mapping
Ingest, QC, and map reads with reproducible outputs. Use for raw read processing and coverage stats.
Instructions
- Parse sample sheet and validate inputs.
- For short reads: Run QC/trimming (bbduk).
- For long reads: Trim adapters (Porechop) and filter by quality/length (Filtlong).
- Map reads (bbmap or minimap2) and generate coverage tables.
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Sample sheet and reads are available. Inputs:
- sample_sheet.tsv
- reads/*.fastq.gz
- reference.fasta (optional)
Output
- results/bio-reads-qc-mapping/trimmed_reads/
- results/bio-reads-qc-mapping/qc_reports/
- results/bio-reads-qc-mapping/mapping_stats.tsv
- results/bio-reads-qc-mapping/coverage.tsv
- results/bio-reads-qc-mapping/logs/
Quality Gates
- Post-QC read count sanity checks pass.
- Mapping rate meets project thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Validate sample sheet schema and FASTQ integrity.
Examples
Example 1: Expected input layout
sample_sheet.tsv
reads/*.fastq.gz
reference.fasta (optional)
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-reads-qc-mapping/SKILL.mdView on GitHub Overview
This skill provides end-to-end processing of sequencing reads, from input validation to trimming, adapter removal, and read filtering for both short and long reads. It then maps reads with bbmap or minimap2 and outputs coverage statistics and QC reports in a reproducible directory structure.
How This Skill Works
The workflow starts by parsing and validating the sample_sheet.tsv and input reads. If the data are short reads, it runs QC/trimming with bbduk; if long reads, it trims adapters with Porechop and filters by quality/length with Filtlong. Finally, it maps reads using bbmap or minimap2 and generates mapping statistics and coverage tables along with QC reports.
When to Use It
- You have raw Illumina-style short reads and need QC, trimming, and coverage stats.
- You require a reproducible pipeline that relies on a sample sheet to process multiple samples.
- You are working with long reads (e.g., Nanopore) and need adapter trimming plus quality/length filtering.
- You need mapping statistics to assess project thresholds and decision points.
- You want outputs organized under a consistent path with accompanying logs and reports.
Quick Start
- Step 1: Parse sample_sheet.tsv and validate inputs.
- Step 2: If short reads exist, run QC/trimming (bbduk); if long reads exist, trim adapters (Porechop) and filter by quality/length (Filtlong).
- Step 3: Map reads (bbmap or minimap2) and generate coverage tables; review outputs under results/bio-reads-qc-mapping and inspect qc_reports.
Best Practices
- Validate the sample_sheet.tsv structure and ensure reads and reference paths exist before running.
- For short reads, prefer bbduk for QC/trimming; for long reads, use Porechop for adapters and Filtlong for quality/length filtering.
- Keep reference.fasta optional but ensure correct path if used; verify its compatibility with the mapper.
- Inspect qc_reports early to decide if parameter adjustments are needed before mapping.
- Store all outputs under results/bio-reads-qc-mapping and capture logs to enable reproducibility and audit trails.
Example Use Cases
- Example 1: Expected input layout with sample_sheet.tsv, reads/*.fastq.gz, and optional reference.fasta; produces trimmed_reads, qc_reports, mapping_stats.tsv, coverage.tsv, and logs.
- Example 2: Illumina paired-end data; QC with bbduk, trimming applied, then mapping with bbmap; outputs include mapping_stats.tsv and coverage.tsv.
- Example 3: Nanopore long reads; adapters trimmed with Porechop, quality/length filtered with Filtlong, followed by minimap2 mapping and coverage generation.
- Example 4: Project requiring mapping rate thresholds; gates are evaluated, and retries with alternative parameters are captured in the QC report if needed.
- Example 5: Using a custom reference.fasta; mapping stats reflect alignment quality against the specified reference.