Get the FREE Ultimate OpenClaw Setup Guide →

bio-structure-annotation

Scanned
npx machina-cli add skill fmschulz/omics-skills/bio-structure-annotation --openclaw
Files (1)
SKILL.md
1.9 KB

Bio Structure Annotation

Structure prediction and structure-based annotation.

Instructions

  1. Run fast embedding screen (tm-vec).
  2. Predict structures (boltz or colabfold) as needed.
  3. Search structures with Foldseek and annotate hits.

Quick Reference

TaskAction
Run workflowFollow the steps in this skill and capture outputs.
Validate inputsConfirm required inputs and reference data exist.
Review outputsInspect reports and QC gates before proceeding.
Tool docsSee docs/README.md.
References- See ../bio-skills-references.md

Input Requirements

Prerequisites:

  • Tools available in the active environment (Pixi/conda/system). See docs/README.md for expected tools.
  • Reference DB root: set BIO_DB_ROOT (default /media/shared-expansion/db/ on WSU).
  • Protein FASTA inputs are available. Inputs:
  • proteins.faa (FASTA protein sequences)

Output

  • results/bio-structure-annotation/structures/
  • results/bio-structure-annotation/structure_hits.tsv
  • results/bio-structure-annotation/structure_report.md
  • results/bio-structure-annotation/logs/

Quality Gates

  • Prediction success rate meets project thresholds.
  • Search hit thresholds meet project thresholds.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
  • Verify proteins.faa is non-empty and amino acid encoded.
  • Verify Foldseek databases exist under the reference root.

Examples

Example 1: Expected input layout

proteins.faa (FASTA protein sequences)

Troubleshooting

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.

Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.

Source

git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-structure-annotation/SKILL.mdView on GitHub

Overview

Automates protein structure prediction and structure-based annotation from FASTA inputs. It runs a fast embedding screen (tm-vec), then predicts structures with boltz or ColabFold as needed, and finally searches and annotates hits with Foldseek. Outputs include structure files, hit reports, and QC gates to guide validation.

How This Skill Works

Inputs are protein FASTA (proteins.faa) and tool prerequisites (PIX/conda/system) plus BIO_DB_ROOT. The workflow performs a fast embedding screen, predicts structures with boltz or ColabFold, then searches the predicted structures with Foldseek and annotates the hits, writing results to the designated results folder.

When to Use It

  • When you have proteins.faa and want to obtain 3D structure predictions for annotation.
  • When you need structure-based functional insights by comparing predicted models to known folds via Foldseek.
  • When you require scalable structure searches against large reference databases.
  • When QC gates must be validated before proceeding to downstream analyses.
  • When you need to retry with alternative parameters or tools (boltz vs ColabFold) to improve hit recovery.

Quick Start

  1. Step 1: Run fast embedding screen (tm-vec).
  2. Step 2: Predict structures (boltz or ColabFold) as needed.
  3. Step 3: Search structures with Foldseek and annotate hits.

Best Practices

  • Validate that proteins.faa is non-empty and correctly formatted.
  • Set and verify BIO_DB_ROOT and ensure reference databases exist.
  • Run the fast embedding screen before structure prediction to filter candidates.
  • Check outputs in results/bio-structure-annotation/ and review structure_report.md.
  • Review QC gates, adjust parameters as needed, and re-run only the affected steps.

Example Use Cases

  • Annotate a bacterial proteome by predicting structures and annotating Foldseek hits.
  • Compare a protein family across multiple species to infer conserved structural features.
  • Troubleshoot missing inputs by confirming paths and permissions.
  • Improve hit coverage by retrying with alternate parameters for structure prediction.
  • Generate a publishable structure_report.md outlining predictions and annotations.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers