bio-reads-qc-mapping
Bio Reads QC Mapping
Ingest, QC, and map reads with reproducible outputs. Use for raw read processing and coverage stats.
Instructions
- Parse sample sheet and validate inputs.
- For short reads: Run QC/trimming (bbduk).
- For long reads: Trim adapters (Porechop) and filter by quality/length (Filtlong).
- Map reads (bbmap or minimap2) and generate coverage tables.
Quick Reference
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See docs/README.md. |
| References | - See ../bio-skills-references.md |
Input Requirements
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See
docs/README.mdfor expected tools. - Sample sheet and reads are available. Inputs:
- sample_sheet.tsv
- reads/*.fastq.gz
- reference.fasta (optional)
Output
- results/bio-reads-qc-mapping/trimmed_reads/
- results/bio-reads-qc-mapping/qc_reports/
- results/bio-reads-qc-mapping/mapping_stats.tsv
- results/bio-reads-qc-mapping/coverage.tsv
- results/bio-reads-qc-mapping/logs/
Quality Gates
- Post-QC read count sanity checks pass.
- Mapping rate meets project thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Validate sample sheet schema and FASTQ integrity.
Examples
Example 1: Expected input layout
sample_sheet.tsv
reads/*.fastq.gz
reference.fasta (optional)
Troubleshooting
Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
More from fmschulz/omics-skills
beautiful-data-viz
Create publication-quality matplotlib/seaborn charts with readable axes, tight layout, and curated palettes.
19bio-phylogenomics
Build marker gene alignments and phylogenetic trees.
19bio-protein-clustering-pangenome
Cluster proteins into orthogroups and derive pangenome matrices.
18plotly-dashboard-skill
Build production-ready Plotly Dash dashboards with consistent theming, clear layouts, and performant callbacks.
17bio-logic
Evaluate scientific rigor, methods, biases, and evidence quality for claims, papers, and study designs.
16bio-workflow-methods-docwriter
Generate reproducible Methods documentation from workflow run artifacts (Nextflow/Snakemake/CWL), including exact commands, versions, parameters, QC gates, and outputs.
15