bio-prefect-dask-nextflow
Bio Prefect + Dask + Nextflow
Choose and scaffold the right workflow engine for local, distributed, or HPC bioinformatics pipelines.
Instructions
- Collect requirements (scheduler, container policy, data location, scale).
- Choose engine: Prefect+Dask, Nextflow, or Hybrid.
- Generate a runnable scaffold with clear data layout and resources.
- Validate with a small test and resume/retry checks.
Quick Reference
| Task | Action |
|---|---|
| Engine choice | See decision-matrix.md |
| Prefect+Dask scaffold | See prefect-dask.md |
| Prefect on Slurm | See prefect-hpc-slurm.md |
| Nextflow on HPC | See nextflow-hpc.md |
| Examples | See examples.md |
Input Requirements
- Workflow requirements and steps
- Target environment (local, cluster, cloud)
- Scheduler and container constraints
- Data locations and expected volumes
Output
- Engine recommendation with rationale
- Runnable scaffold (files + commands)
- Resource plan per step
- Validation plan and checkpoints
Quality Gates
- Tiny test run completes end-to-end
- Resume/retry behavior verified
- Resource plan matches cluster limits
Examples
Example 1: Engine recommendation
Choice: Nextflow
Why: CLI-heavy pipeline, HPC scheduler required, reproducible cache/resume needed.
Troubleshooting
Issue: Workflow fails on HPC due to environment mismatch Solution: Pin container/conda versions and validate with a minimal test dataset.
More from fmschulz/omics-skills
beautiful-data-viz
Create publication-quality matplotlib/seaborn charts with readable axes, tight layout, and curated palettes.
19bio-phylogenomics
Build marker gene alignments and phylogenetic trees.
19bio-protein-clustering-pangenome
Cluster proteins into orthogroups and derive pangenome matrices.
18plotly-dashboard-skill
Build production-ready Plotly Dash dashboards with consistent theming, clear layouts, and performant callbacks.
17bio-annotation
Functional annotation and taxonomy inference from sequence homology.
16bio-foundation-housekeeping
Initialize a bioinformatics project scaffold with reproducible environments, schemas, and data cataloging. Use for new projects or repo setup.
16