bio-prefect-dask-nextflow
npx machina-cli add skill fmschulz/omics-skills/bio-prefect-dask-nextflow --openclawBio Prefect + Dask + Nextflow
Choose and scaffold the right workflow engine for local, distributed, or HPC bioinformatics pipelines.
Instructions
- Collect requirements (scheduler, container policy, data location, scale).
- Choose engine: Prefect+Dask, Nextflow, or Hybrid.
- Generate a runnable scaffold with clear data layout and resources.
- Validate with a small test and resume/retry checks.
Quick Reference
| Task | Action |
|---|---|
| Engine choice | See decision-matrix.md |
| Prefect+Dask scaffold | See prefect-dask.md |
| Prefect on Slurm | See prefect-hpc-slurm.md |
| Nextflow on HPC | See nextflow-hpc.md |
| Examples | See examples.md |
Input Requirements
- Workflow requirements and steps
- Target environment (local, cluster, cloud)
- Scheduler and container constraints
- Data locations and expected volumes
Output
- Engine recommendation with rationale
- Runnable scaffold (files + commands)
- Resource plan per step
- Validation plan and checkpoints
Quality Gates
- Tiny test run completes end-to-end
- Resume/retry behavior verified
- Resource plan matches cluster limits
Examples
Example 1: Engine recommendation
Choice: Nextflow
Why: CLI-heavy pipeline, HPC scheduler required, reproducible cache/resume needed.
Troubleshooting
Issue: Workflow fails on HPC due to environment mismatch Solution: Pin container/conda versions and validate with a minimal test dataset.
Source
git clone https://github.com/fmschulz/omics-skills/blob/main/skills/bio-prefect-dask-nextflow/SKILL.mdView on GitHub Overview
This skill helps you design and scaffold bioinformatics workflows by selecting the right engine for your environment. It guides requirements gathering, generates runnable scaffolds with clear data layouts, and validates end-to-end with test runs. It covers local/distributed execution using Prefect+Dask and HPC-oriented Nextflow.
How This Skill Works
Start by collecting requirements (scheduler, container policy, data location, scale), then choose Prefect+Dask, Nextflow, or Hybrid, and generate a runnable scaffold with explicit files, data layout, and resource specs. Finally perform a small test and implement resume/retry checks to ensure reliability.
When to Use It
- Setting up a local workstation or small cluster for development
- Distributing workloads across nodes in a shared file system
- HPC environments where Slurm, PBS, or similar schedulers are used
- Need for reproducible cache and resume/retry behavior
- Hybrid scenarios where you mix Prefect+Dask for orchestration with Nextflow for HPC
Quick Start
- Step 1: Collect requirements (scheduler, container policy, data location, scale)
- Step 2: Choose engine: Prefect+Dask, Nextflow, or Hybrid
- Step 3: Generate runnable scaffold, run a tiny test, and verify resume/retry
Best Practices
- Collect requirements before scaffolding (scheduler, container policy, data location, scale)
- Choose the engine based on environment: Prefect+Dask for local/distributed, Nextflow for HPC, or Hybrid
- Produce a runnable scaffold with clear data layout and per-step resources
- Define a validation plan with a tiny end-to-end test and resume/retry checks
- Pin container/conda versions and validate environment compatibility
Example Use Cases
- Example: Engine recommendation selecting Nextflow for HPC with reproducible cache and resume
- Example: Prefect+Dask scaffold designed for local development or a small cluster
- Example: Prefect on Slurm to orchestrate tasks on an HPC scheduler
- Example: Nextflow on HPC workflow demonstrating resource-aware scheduling
- Example: Hybrid workflow blending Prefect+Dask orchestration with Nextflow HPC components