bio-phylogenomics
npx machina-cli add skill fmschulz/omics-skills/bio-phylogenomics --openclawBio Phylogenomics
Build marker gene alignments and phylogenetic trees.
Instructions
- Extract marker genes or SSU rRNA sequences.
- Align and trim sequences using project-standard workflows.
- Build ML trees with bootstraps:
- Standard accuracy: Use IQ-TREE (comprehensive model selection, publication-quality)
- Fast mode: Use IQ-TREE -fast (exploratory analysis, large datasets >10K sequences)
- Very large datasets: Use VeryFastTree (>100K sequences, ultra-fast)
- Post-process trees with ETE Toolkit:
- Calculate tree statistics (branch lengths, distances, topology metrics)
- Root, prune, or collapse nodes as needed
- Filter by bootstrap support values
- Add taxonomic or trait annotations
- Generate publication-quality visualizations
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Marker gene set or alignments available. Inputs:
- markers.faa (marker genes) or alignments.fasta
Output
- results/bio-phylogenomics/alignments/
- results/bio-phylogenomics/trees/
- results/bio-phylogenomics/phylo_report.md
- results/bio-phylogenomics/logs/
Quality Gates
- Alignment length and missingness meet project thresholds.
- Bootstrap support summary meets project thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify markers.faa is non-empty and aligned sequences are consistent.
Examples
Example 1: Expected input layout
markers.faa (marker genes) or alignments.fasta
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-phylogenomics/SKILL.mdView on GitHub Overview
This skill guides constructing phylogenies from marker genes or SSU rRNA. It covers extracting markers, aligning and trimming sequences, building maximum-likelihood trees with bootstrap support, and post-processing for publication-quality visualizations.
How This Skill Works
Start with marker genes or alignments, then align and trim sequences using project-standard workflows. Build ML trees with bootstraps using IQ-TREE (with comprehensive model selection) or the fast mode for large datasets, and use VeryFastTree for very large datasets. Finally, post-process trees with the ETE Toolkit to root, prune, collapse nodes, filter by bootstrap, add annotations, and generate publication-ready visuals.
When to Use It
- When assembling a marker-gene based phylogeny across samples
- When you need publication-quality trees with bootstrap support
- When datasets are large (>10K sequences) requiring fast modes
- When you want to annotate trees with taxonomic or trait information and generate visuals
- When validating inputs and QC gates before downstream analyses
Quick Start
- Step 1: Prepare inputs (markers.faa or alignments.fasta) and confirm prerequisites/tools are available
- Step 2: Run alignment/trim workflow and build an ML tree with IQ-TREE (or -fast for speed) or VeryFastTree for very large datasets, then post-process with ETE
- Step 3: Root/prune/collapse nodes as needed, add taxonomic/trait annotations, and generate publication-quality visuals
Best Practices
- Verify inputs exist: markers.faa or alignments.fasta and non-empty
- Use project-standard alignment and trimming workflows before tree-building
- Choose the right tool for dataset size: IQ-TREE for accuracy, -fast for exploration, VeryFastTree for very large data
- Post-process with the ETE Toolkit to root, prune, collapse nodes, and annotate
- Filter trees by bootstrap support and review QC gates prior to publication
Example Use Cases
- Construct a marker-gene phylogeny for gut microbiome samples to study evolutionary relationships
- Generate a publication-quality phylogenetic tree with bootstrap support for a manuscript
- Compare topology using IQ-TREE versus VeryFastTree on a dataset exceeding 100K sequences
- Annotate a tree with taxonomy and traits to visualize clade distributions across environments
- Document QC steps, produce a phylo_report, and prepare figures for a paper