What inputs are required to run bio-phylogenomics?

Marker gene set (markers.faa) or alignments (alignments.fasta) plus prerequisites/tools (Pixi/conda/system). Ensure inputs exist and are properly formatted.

Which tools should I use for tree building?

Use IQ-TREE for comprehensive model selection and publication-quality trees; use IQ-TREE -fast for exploratory analyses on large datasets; for very large datasets (>100K sequences) consider VeryFastTree.

What post-processing steps are recommended?

Post-process with the ETE Toolkit to root, prune, or collapse nodes; calculate tree statistics; filter by bootstrap support; add taxonomic or trait annotations; and generate publication-quality visuals.

bio-phylogenomics

npx machina-cli add skill fmschulz/omics-skills/bio-phylogenomics --openclaw

Files (1)

SKILL.md

2.3 KB

Bio Phylogenomics

Build marker gene alignments and phylogenetic trees.

Instructions

Extract marker genes or SSU rRNA sequences.
Align and trim sequences using project-standard workflows.
Build ML trees with bootstraps:
Standard accuracy: Use IQ-TREE (comprehensive model selection, publication-quality)
Fast mode: Use IQ-TREE -fast (exploratory analysis, large datasets >10K sequences)
Very large datasets: Use VeryFastTree (>100K sequences, ultra-fast)
Post-process trees with ETE Toolkit:
Calculate tree statistics (branch lengths, distances, topology metrics)
Root, prune, or collapse nodes as needed
Filter by bootstrap support values
Add taxonomic or trait annotations
Generate publication-quality visualizations

Quick Reference

Task	Action
Run workflow	Follow the steps in this skill and capture outputs.
Validate inputs	Confirm required inputs and reference data exist.
Review outputs	Inspect reports and QC gates before proceeding.
Tool docs	See `docs/README.md`.
References	- See ../bio-skills-references.md

Input Requirements

Prerequisites:

Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
Marker gene set or alignments available. Inputs:
markers.faa (marker genes) or alignments.fasta

Output

results/bio-phylogenomics/alignments/
results/bio-phylogenomics/trees/
results/bio-phylogenomics/phylo_report.md
results/bio-phylogenomics/logs/

Quality Gates

Alignment length and missingness meet project thresholds.
Bootstrap support summary meets project thresholds.
On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
Verify markers.faa is non-empty and aligned sequences are consistent.

Examples

Example 1: Expected input layout

markers.faa (marker genes) or alignments.fasta

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-phylogenomics/SKILL.mdView on GitHub

Overview

This skill guides constructing phylogenies from marker genes or SSU rRNA. It covers extracting markers, aligning and trimming sequences, building maximum-likelihood trees with bootstrap support, and post-processing for publication-quality visualizations.

How This Skill Works

Start with marker genes or alignments, then align and trim sequences using project-standard workflows. Build ML trees with bootstraps using IQ-TREE (with comprehensive model selection) or the fast mode for large datasets, and use VeryFastTree for very large datasets. Finally, post-process trees with the ETE Toolkit to root, prune, collapse nodes, filter by bootstrap, add annotations, and generate publication-ready visuals.

When to Use It

When assembling a marker-gene based phylogeny across samples
When you need publication-quality trees with bootstrap support
When datasets are large (>10K sequences) requiring fast modes
When you want to annotate trees with taxonomic or trait information and generate visuals
When validating inputs and QC gates before downstream analyses

Quick Start

Step 1: Prepare inputs (markers.faa or alignments.fasta) and confirm prerequisites/tools are available
Step 2: Run alignment/trim workflow and build an ML tree with IQ-TREE (or -fast for speed) or VeryFastTree for very large datasets, then post-process with ETE
Step 3: Root/prune/collapse nodes as needed, add taxonomic/trait annotations, and generate publication-quality visuals

Best Practices

Verify inputs exist: markers.faa or alignments.fasta and non-empty
Use project-standard alignment and trimming workflows before tree-building
Choose the right tool for dataset size: IQ-TREE for accuracy, -fast for exploration, VeryFastTree for very large data
Post-process with the ETE Toolkit to root, prune, collapse nodes, and annotate
Filter trees by bootstrap support and review QC gates prior to publication

Example Use Cases

Construct a marker-gene phylogeny for gut microbiome samples to study evolutionary relationships
Generate a publication-quality phylogenetic tree with bootstrap support for a manuscript
Compare topology using IQ-TREE versus VeryFastTree on a dataset exceeding 100K sequences
Annotate a tree with taxonomy and traits to visualize clade distributions across environments
Document QC steps, produce a phylo_report, and prepare figures for a paper

Frequently Asked Questions

Add this skill to your agents