skills/gptomics/bioskills/bio-longread-alignment

bio-longread-alignment

SKILL.md

Long-Read Alignment with minimap2

Oxford Nanopore Alignment

# Basic ONT alignment
minimap2 -ax map-ont reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

PacBio HiFi Alignment

# PacBio HiFi reads (high accuracy)
minimap2 -ax map-hifi reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

PacBio CLR Alignment

# PacBio CLR (continuous long reads, lower accuracy)
minimap2 -ax map-pb reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

Pre-Build Index for Multiple Runs

# Build index once
minimap2 -d reference.mmi reference.fa

# Use index for alignment
minimap2 -ax map-ont reference.mmi reads.fastq.gz | samtools sort -o aligned.bam

Common Options

minimap2 -ax map-ont \
    -t 8 \                         # Threads
    -R '@RG\tID:sample\tSM:sample' \  # Read group
    --secondary=no \               # No secondary alignments
    --MD \                         # Generate MD tag for variants
    -Y \                           # Use soft clipping for supplementary
    reference.fa reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam

Splice-Aware Alignment (RNA)

# For direct RNA or cDNA sequencing
minimap2 -ax splice reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam

With Junction BED (Known Splice Sites)

# Provide known splice junctions
minimap2 -ax splice --junc-bed junctions.bed \
    reference.fa reads.fastq.gz | samtools sort -o aligned.bam

Assembly to Reference Alignment

# Assembly with ~0.1% divergence
minimap2 -ax asm5 reference.fa assembly.fa > aligned.sam

# Assembly with higher divergence (~5%)
minimap2 -ax asm20 reference.fa assembly.fa > aligned.sam

Output PAF (Faster, No BAM)

# PAF format (faster, for quick analysis)
minimap2 -x map-ont reference.fa reads.fastq.gz > alignments.paf

Keep Secondary and Supplementary

# Keep all alignments (for SV calling)
minimap2 -ax map-ont \
    --secondary=yes \
    -N 5 \                         # Max secondary alignments
    reference.fa reads.fastq.gz | samtools sort -o aligned.bam

Filter Alignments

# During alignment pipeline
minimap2 -ax map-ont reference.fa reads.fastq.gz | \
    samtools view -b -q 10 | \     # Min mapping quality 10
    samtools sort -o aligned.bam

Multiple FASTQ Files

# Concatenate inputs
minimap2 -ax map-ont reference.fa reads1.fastq.gz reads2.fastq.gz | \
    samtools sort -o aligned.bam

# Or use file list
cat file_list.txt | xargs minimap2 -ax map-ont reference.fa | \
    samtools sort -o aligned.bam

Output Statistics

# Get alignment statistics
samtools flagstat aligned.bam

# Detailed stats
samtools stats aligned.bam | grep ^SN

Convert PAF to BED

# Extract alignments to BED
awk 'OFS="\t" {print $6, $8, $9, $1, $12, ($5=="+")?"+":"-"}' alignments.paf > alignments.bed

Key Presets

Preset Description Best For
map-ont ONT reads Nanopore genomic
map-hifi PacBio HiFi PacBio genomic
map-pb PacBio CLR PacBio CLR
splice Long RNA reads cDNA, direct RNA
asm5 Low divergence Same species assembly
asm20 High divergence Cross-species assembly
sr Short reads Illumina (basic)

Key Parameters

Parameter Default Description
-t 3 CPU threads
-k 15 K-mer size
-w 10 Minimizer window
-a off Output SAM
-x none Preset
--secondary yes Output secondary
-N 5 Max secondary alignments
--MD off Generate MD tag
-R none Read group header
-Y off Soft clipping for supplementary

Output Formats

Format Flag Description
PAF (default) Pairwise Alignment Format
SAM -a Sequence Alignment Map
BAM -a | samtools Binary SAM

Related Skills

  • medaka-polishing - Polish consensus with medaka
  • structural-variants - Call SVs from alignments
  • alignment-files - BAM manipulation
Weekly Installs
4
Installed on
claude-code3
windsurf2
trae2
opencode2
codex2
antigravity2