bio-binning-qc
Bio Binning QC
Perform metagenomic binning, refinement, and QC with completeness/contamination checks.
Instructions
- Compute depth/coverage per sample.
- Run multiple binners (MetaBAT2, SemiBin2, QuickBin).
- Classify bins by domain (bacteria/archaea vs eukaryotes).
- Run domain-specific QC:
- CheckM2 for bacterial and archaeal bins
- EukCC for eukaryotic bins
- GUNC for contamination detection (all domains).
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Reference DB root: set
BIO_DB_ROOT(default/media/shared-expansion/db/on WSU). - Coverage/depth tables or reads available to compute coverage. Inputs:
- contigs.fasta
- coverage.tsv (per-sample depth table)
Output
- results/bio-binning-qc/bins/
- results/bio-binning-qc/bin_metrics.tsv
- results/bio-binning-qc/bin_qc_report.html
- results/bio-binning-qc/logs/
Quality Gates
- Completeness and contamination meet project thresholds.
- Chimera and contamination flags are below thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify contigs.fasta and coverage.tsv are non-empty.
- Verify reference DBs for QC tools exist under the reference root.
Examples
Example 1: Expected input layout
contigs.fasta
coverage.tsv (per-sample depth table)
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
More from fmschulz/omics-skills
beautiful-data-viz
Create publication-quality matplotlib/seaborn charts with readable axes, tight layout, and curated palettes.
19bio-phylogenomics
Build marker gene alignments and phylogenetic trees.
19bio-protein-clustering-pangenome
Cluster proteins into orthogroups and derive pangenome matrices.
18plotly-dashboard-skill
Build production-ready Plotly Dash dashboards with consistent theming, clear layouts, and performant callbacks.
17bio-annotation
Functional annotation and taxonomy inference from sequence homology.
16bio-foundation-housekeeping
Initialize a bioinformatics project scaffold with reproducible environments, schemas, and data cataloging. Use for new projects or repo setup.
16