What inputs are required?

Inputs include sample_sheet.tsv, reads/*.fastq.gz, and an optional reference.fasta.

Where are outputs stored?

Outputs reside in results/bio-reads-qc-mapping, with subfolders for trimmed_reads, qc_reports, and files like mapping_stats.tsv and coverage.tsv, plus logs.

Which tools are used in this skill?

Short reads use bbduk for QC/trimming; long reads use Porechop for adapter trimming and Filtlong for quality/length filtering; mapping uses bbmap or minimap2.

bio-reads-qc-mapping

npx machina-cli add skill fmschulz/omics-skills/bio-reads-qc-mapping --openclaw

Files (1)

SKILL.md

2.0 KB

Bio Reads QC Mapping

Ingest, QC, and map reads with reproducible outputs. Use for raw read processing and coverage stats.

Instructions

Parse sample sheet and validate inputs.
For short reads: Run QC/trimming (bbduk).
For long reads: Trim adapters (Porechop) and filter by quality/length (Filtlong).
Map reads (bbmap or minimap2) and generate coverage tables.

Quick Reference

Task	Action
Run workflow	Follow the steps in this skill and capture outputs.
Validate inputs	Confirm required inputs and reference data exist.
Review outputs	Inspect reports and QC gates before proceeding.
Tool docs	See `docs/README.md`.
References	- See ../bio-skills-references.md

Input Requirements

Prerequisites:

Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
Sample sheet and reads are available. Inputs:
sample_sheet.tsv
reads/*.fastq.gz
reference.fasta (optional)

Output

results/bio-reads-qc-mapping/trimmed_reads/
results/bio-reads-qc-mapping/qc_reports/
results/bio-reads-qc-mapping/mapping_stats.tsv
results/bio-reads-qc-mapping/coverage.tsv
results/bio-reads-qc-mapping/logs/

Quality Gates

Post-QC read count sanity checks pass.
Mapping rate meets project thresholds.
On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
Validate sample sheet schema and FASTQ integrity.

Examples

Example 1: Expected input layout

sample_sheet.tsv
reads/*.fastq.gz
reference.fasta (optional)

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-reads-qc-mapping/SKILL.mdView on GitHub

Overview

This skill provides end-to-end processing of sequencing reads, from input validation to trimming, adapter removal, and read filtering for both short and long reads. It then maps reads with bbmap or minimap2 and outputs coverage statistics and QC reports in a reproducible directory structure.

How This Skill Works

The workflow starts by parsing and validating the sample_sheet.tsv and input reads. If the data are short reads, it runs QC/trimming with bbduk; if long reads, it trims adapters with Porechop and filters by quality/length with Filtlong. Finally, it maps reads using bbmap or minimap2 and generates mapping statistics and coverage tables along with QC reports.

When to Use It

You have raw Illumina-style short reads and need QC, trimming, and coverage stats.
You require a reproducible pipeline that relies on a sample sheet to process multiple samples.
You are working with long reads (e.g., Nanopore) and need adapter trimming plus quality/length filtering.
You need mapping statistics to assess project thresholds and decision points.
You want outputs organized under a consistent path with accompanying logs and reports.

Quick Start

Step 1: Parse sample_sheet.tsv and validate inputs.
Step 2: If short reads exist, run QC/trimming (bbduk); if long reads exist, trim adapters (Porechop) and filter by quality/length (Filtlong).
Step 3: Map reads (bbmap or minimap2) and generate coverage tables; review outputs under results/bio-reads-qc-mapping and inspect qc_reports.

Best Practices

Validate the sample_sheet.tsv structure and ensure reads and reference paths exist before running.
For short reads, prefer bbduk for QC/trimming; for long reads, use Porechop for adapters and Filtlong for quality/length filtering.
Keep reference.fasta optional but ensure correct path if used; verify its compatibility with the mapper.
Inspect qc_reports early to decide if parameter adjustments are needed before mapping.
Store all outputs under results/bio-reads-qc-mapping and capture logs to enable reproducibility and audit trails.

Example Use Cases

Example 1: Expected input layout with sample_sheet.tsv, reads/*.fastq.gz, and optional reference.fasta; produces trimmed_reads, qc_reports, mapping_stats.tsv, coverage.tsv, and logs.
Example 2: Illumina paired-end data; QC with bbduk, trimming applied, then mapping with bbmap; outputs include mapping_stats.tsv and coverage.tsv.
Example 3: Nanopore long reads; adapters trimmed with Porechop, quality/length filtered with Filtlong, followed by minimap2 mapping and coverage generation.
Example 4: Project requiring mapping rate thresholds; gates are evaluated, and retries with alternative parameters are captured in the QC report if needed.
Example 5: Using a custom reference.fasta; mapping stats reflect alignment quality against the specified reference.

Frequently Asked Questions

Add this skill to your agents