bio-structure-annotation
Scannednpx machina-cli add skill fmschulz/omics-skills/bio-structure-annotation --openclawBio Structure Annotation
Structure prediction and structure-based annotation.
Instructions
- Run fast embedding screen (tm-vec).
- Predict structures (boltz or colabfold) as needed.
- Search structures with Foldseek and annotate hits.
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Reference DB root: set
BIO_DB_ROOT(default/media/shared-expansion/db/on WSU). - Protein FASTA inputs are available. Inputs:
- proteins.faa (FASTA protein sequences)
Output
- results/bio-structure-annotation/structures/
- results/bio-structure-annotation/structure_hits.tsv
- results/bio-structure-annotation/structure_report.md
- results/bio-structure-annotation/logs/
Quality Gates
- Prediction success rate meets project thresholds.
- Search hit thresholds meet project thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify proteins.faa is non-empty and amino acid encoded.
- Verify Foldseek databases exist under the reference root.
Examples
Example 1: Expected input layout
proteins.faa (FASTA protein sequences)
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-structure-annotation/SKILL.mdView on GitHub Overview
Automates protein structure prediction and structure-based annotation from FASTA inputs. It runs a fast embedding screen (tm-vec), then predicts structures with boltz or ColabFold as needed, and finally searches and annotates hits with Foldseek. Outputs include structure files, hit reports, and QC gates to guide validation.
How This Skill Works
Inputs are protein FASTA (proteins.faa) and tool prerequisites (PIX/conda/system) plus BIO_DB_ROOT. The workflow performs a fast embedding screen, predicts structures with boltz or ColabFold, then searches the predicted structures with Foldseek and annotates the hits, writing results to the designated results folder.
When to Use It
- When you have proteins.faa and want to obtain 3D structure predictions for annotation.
- When you need structure-based functional insights by comparing predicted models to known folds via Foldseek.
- When you require scalable structure searches against large reference databases.
- When QC gates must be validated before proceeding to downstream analyses.
- When you need to retry with alternative parameters or tools (boltz vs ColabFold) to improve hit recovery.
Quick Start
- Step 1: Run fast embedding screen (tm-vec).
- Step 2: Predict structures (boltz or ColabFold) as needed.
- Step 3: Search structures with Foldseek and annotate hits.
Best Practices
- Validate that proteins.faa is non-empty and correctly formatted.
- Set and verify BIO_DB_ROOT and ensure reference databases exist.
- Run the fast embedding screen before structure prediction to filter candidates.
- Check outputs in results/bio-structure-annotation/ and review structure_report.md.
- Review QC gates, adjust parameters as needed, and re-run only the affected steps.
Example Use Cases
- Annotate a bacterial proteome by predicting structures and annotating Foldseek hits.
- Compare a protein family across multiple species to infer conserved structural features.
- Troubleshoot missing inputs by confirming paths and permissions.
- Improve hit coverage by retrying with alternate parameters for structure prediction.
- Generate a publishable structure_report.md outlining predictions and annotations.