Get the FREE Ultimate OpenClaw Setup Guide →

bio-phylogenomics

npx machina-cli add skill fmschulz/omics-skills/bio-phylogenomics --openclaw
Files (1)
SKILL.md
2.3 KB

Bio Phylogenomics

Build marker gene alignments and phylogenetic trees.

Instructions

  1. Extract marker genes or SSU rRNA sequences.
  2. Align and trim sequences using project-standard workflows.
  3. Build ML trees with bootstraps:
  4. Standard accuracy: Use IQ-TREE (comprehensive model selection, publication-quality)
  5. Fast mode: Use IQ-TREE -fast (exploratory analysis, large datasets >10K sequences)
  6. Very large datasets: Use VeryFastTree (>100K sequences, ultra-fast)
  7. Post-process trees with ETE Toolkit:
  8. Calculate tree statistics (branch lengths, distances, topology metrics)
  9. Root, prune, or collapse nodes as needed
  10. Filter by bootstrap support values
  11. Add taxonomic or trait annotations
  12. Generate publication-quality visualizations

Quick Reference

TaskAction
Run workflowFollow the steps in this skill and capture outputs.
Validate inputsConfirm required inputs and reference data exist.
Review outputsInspect reports and QC gates before proceeding.
Tool docsSee docs/README.md.
References- See ../bio-skills-references.md

Input Requirements

Prerequisites:

  • Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
  • Marker gene set or alignments available. Inputs:
  • markers.faa (marker genes) or alignments.fasta

Output

  • results/bio-phylogenomics/alignments/
  • results/bio-phylogenomics/trees/
  • results/bio-phylogenomics/phylo_report.md
  • results/bio-phylogenomics/logs/

Quality Gates

  • Alignment length and missingness meet project thresholds.
  • Bootstrap support summary meets project thresholds.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
  • Verify markers.faa is non-empty and aligned sequences are consistent.

Examples

Example 1: Expected input layout

markers.faa (marker genes) or alignments.fasta

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-phylogenomics/SKILL.mdView on GitHub

Overview

This skill guides constructing phylogenies from marker genes or SSU rRNA. It covers extracting markers, aligning and trimming sequences, building maximum-likelihood trees with bootstrap support, and post-processing for publication-quality visualizations.

How This Skill Works

Start with marker genes or alignments, then align and trim sequences using project-standard workflows. Build ML trees with bootstraps using IQ-TREE (with comprehensive model selection) or the fast mode for large datasets, and use VeryFastTree for very large datasets. Finally, post-process trees with the ETE Toolkit to root, prune, collapse nodes, filter by bootstrap, add annotations, and generate publication-ready visuals.

When to Use It

  • When assembling a marker-gene based phylogeny across samples
  • When you need publication-quality trees with bootstrap support
  • When datasets are large (>10K sequences) requiring fast modes
  • When you want to annotate trees with taxonomic or trait information and generate visuals
  • When validating inputs and QC gates before downstream analyses

Quick Start

  1. Step 1: Prepare inputs (markers.faa or alignments.fasta) and confirm prerequisites/tools are available
  2. Step 2: Run alignment/trim workflow and build an ML tree with IQ-TREE (or -fast for speed) or VeryFastTree for very large datasets, then post-process with ETE
  3. Step 3: Root/prune/collapse nodes as needed, add taxonomic/trait annotations, and generate publication-quality visuals

Best Practices

  • Verify inputs exist: markers.faa or alignments.fasta and non-empty
  • Use project-standard alignment and trimming workflows before tree-building
  • Choose the right tool for dataset size: IQ-TREE for accuracy, -fast for exploration, VeryFastTree for very large data
  • Post-process with the ETE Toolkit to root, prune, collapse nodes, and annotate
  • Filter trees by bootstrap support and review QC gates prior to publication

Example Use Cases

  • Construct a marker-gene phylogeny for gut microbiome samples to study evolutionary relationships
  • Generate a publication-quality phylogenetic tree with bootstrap support for a manuscript
  • Compare topology using IQ-TREE versus VeryFastTree on a dataset exceeding 100K sequences
  • Annotate a tree with taxonomy and traits to visualize clade distributions across environments
  • Document QC steps, produce a phylo_report, and prepare figures for a paper

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers