bio-gene-calling
Scannednpx machina-cli add skill fmschulz/omics-skills/bio-gene-calling --openclawBio Gene Calling
Call genes and annotate basic features for prokaryotes, viruses, and eukaryotes.
Instructions
- Select gene caller by organism class.
- Run gene calling and produce GFF/FAA/FNA.
- Detect tRNAs/rRNAs if requested.
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Input contigs or bins are available. Inputs:
- contigs.fasta or bins/*.fasta
Output
- results/bio-gene-calling/genes.gff3
- results/bio-gene-calling/proteins.faa
- results/bio-gene-calling/cds.fna
- results/bio-gene-calling/gene_metrics.tsv
- results/bio-gene-calling/logs/
Quality Gates
- Gene count sanity checks pass.
- Start/stop codon checks pass.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify contigs are non-empty and DNA alphabet.
- Verify outputs contain expected feature types.
Examples
Example 1: Expected input layout
contigs.fasta or bins/*.fasta
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-gene-calling/SKILL.mdView on GitHub Overview
bio-gene-calling performs gene prediction and annotates basic features for prokaryotes, viruses, and eukaryotes. It outputs standard formats (GFF3, FAA, FNA) and can detect tRNAs and rRNAs when requested, fitting into downstream annotation and QC workflows.
How This Skill Works
Choose the gene caller based on the organism class, then run gene calling to generate GFF3, FAA, and FNA files. If requested, enable tRNA/rRNA detection. Outputs are organized under results/bio-gene-calling and pass through QC gates such as gene count sanity, start/stop codon checks, and non-empty DNA inputs.
When to Use It
- Annotating contigs or bins from bacterial or archaeal genome assemblies
- Annotating viral genomes in isolation or from metagenomic data
- Draft eukaryotic genome contigs requiring basic gene models
- Metagenomic bins needing gene content and protein predictions
- Re-running with alternative parameters after QC gates fail
Quick Start
- Step 1: Select gene caller by organism class (prokaryote, virus, or eukaryote).
- Step 2: Run gene calling to produce GFF3, proteins (FAA), and coding sequences (FNA).
- Step 3: If requested, enable tRNA/rRNA detection and review the outputs and logs.
Best Practices
- Verify inputs exist and point to contigs.fasta or bins/*.fasta before running
- Select the correct gene caller based on the organism class to improve accuracy
- Ensure output paths are correctly defined and accessible (GFF3, FAA, FNA, metrics, logs)
- Inspect gene_metrics.tsv and QC gates to confirm plausible results before proceeding
- If requested, enable tRNA/rRNA detection and review non-coding RNA annotations
Example Use Cases
- Annotating a bacterial genome from contigs.fasta to produce genes.gff3, proteins.faa, and cds.fna
- Annotating a small viral genome and optionally detecting tRNAs during the run
- Draft eukaryotic genome contigs annotated with basic gene models for initial analysis
- Metagenomic bins annotated to reveal gene content and protein predictions
- Re-running with adjusted parameters after QC gates indicate issues with start/stop codons