Get the FREE Ultimate OpenClaw Setup Guide →

bio-annotation

Scanned
npx machina-cli add skill fmschulz/omics-skills/bio-annotation --openclaw
Files (1)
SKILL.md
2.0 KB

Bio Annotation

Functional annotation and taxonomy inference from sequence homology.

Instructions

  1. Run InterProScan for domain/family annotation.
  2. Run eggnog-mapper for orthology-based annotation.
  3. Run DIAMOND and resolve taxonomy with TaxonKit.

Quick Reference

TaskAction
Run workflowFollow the steps in this skill and capture outputs.
Validate inputsConfirm required inputs and reference data exist.
Review outputsInspect reports and QC gates before proceeding.
Tool docsSee docs/README.md.
References- See ../bio-skills-references.md

Input Requirements

Prerequisites:

  • Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
  • Reference DB root: set BIO_DB_ROOT (default /media/shared-expansion/db/ on WSU).
  • Input FASTA and reference DBs are readable. Inputs:
  • proteins.faa (FASTA protein sequences).
  • reference_db/ (eggNOG, InterPro, DIAMOND databases + taxdump).

Output

  • results/bio-annotation/annotations.parquet
  • results/bio-annotation/taxonomy.parquet
  • results/bio-annotation/annotation_report.md
  • results/bio-annotation/logs/

Quality Gates

  • Annotation hit rate and taxonomy rank coverage meet project thresholds.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
  • Verify proteins.faa is non-empty and amino acid encoded.
  • Verify required reference DBs exist under the reference root.

Examples

Example 1: Expected input layout

proteins.faa (FASTA protein sequences).
reference_db/ (eggNOG, InterPro, DIAMOND databases + taxdump).

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-annotation/SKILL.mdView on GitHub

Overview

Bio-annotation delivers functional labels and taxonomy for proteins by integrating domain/family annotations from InterProScan, orthology-based annotations from eggnog-mapper, and taxonomic context from DIAMOND hits resolved with TaxonKit. Outputs include parquet summaries and a readable annotation report, with QC gates to ensure data quality.

How This Skill Works

Input proteins.faa and reference_db are processed through a three-step pipeline: InterProScan annotates domains and families, eggnog-mapper provides orthology-based functional annotations, and DIAMOND identifies close homologs while TaxonKit resolves taxonomy. The results are stored as parquet files and a comprehensive annotation_report.md with logs for QC review.

When to Use It

  • You need functional domain and family annotations for a protein set to understand potential roles.
  • You require orthology-based annotations and inferred functional terms from evolutionary relationships.
  • You must assign taxonomy to proteins based on sequence similarity and taxonomic databases.
  • You want structured outputs (parquet) suitable for downstream analysis and a summary report for stakeholders.
  • You have validated inputs (proteins.faa and reference_db) and want a reproducible QC-driven workflow.

Quick Start

  1. Step 1: Prepare inputs proteins.faa and reference_db, and set BIO_DB_ROOT to the DB directory.
  2. Step 2: Run the bio-annotation workflow to execute InterProScan, eggnog-mapper, DIAMOND and TaxonKit.
  3. Step 3: Inspect outputs in results/bio-annotation (annotations.parquet, taxonomy.parquet, annotation_report.md) and review QC gates.

Best Practices

  • Set BIO_DB_ROOT correctly and verify reference databases exist before running.
  • Confirm input FASTA is non-empty and amino acid encoded to avoid misreads.
  • Use InterProScan and eggnog-mapper with recommended parameters for your organism group.
  • Review annotation_report.md and taxonomy.parquet for consistency before proceeding.
  • Retain logs and run QC gates; retry with adjusted parameters if hits or coverage fall below thresholds.

Example Use Cases

  • Annotating a bacterial proteome to link domains with potential functions and taxonomic placement.
  • Functional annotation of novel plant proteins with orthology-based GO/EC term predictions.
  • Cross-species comparison of protein families using orthology annotations to infer conserved functions.
  • Assigning taxonomy to a metagenomic-like protein set via DIAMOND hits and TaxonKit resolution.
  • Preparing parquet-based summaries for integration into a larger omics analytics pipeline.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers