skills/gptomics/bioskills/bio-read-alignment-hisat2-alignment

bio-read-alignment-hisat2-alignment

SKILL.md

HISAT2 RNA-seq Alignment

Build Index

# Basic index (no annotation)
hisat2-build -p 8 reference.fa hisat2_index

# Index with splice sites and exons (recommended)
hisat2_extract_splice_sites.py annotation.gtf > splice_sites.txt
hisat2_extract_exons.py annotation.gtf > exons.txt

hisat2-build -p 8 \
    --ss splice_sites.txt \
    --exon exons.txt \
    reference.fa hisat2_index

Basic Alignment

# Paired-end reads
hisat2 -p 8 -x hisat2_index \
    -1 reads_1.fq.gz -2 reads_2.fq.gz \
    -S aligned.sam

# Single-end reads
hisat2 -p 8 -x hisat2_index \
    -U reads.fq.gz \
    -S aligned.sam

Direct to Sorted BAM

# Pipe to samtools
hisat2 -p 8 -x hisat2_index \
    -1 r1.fq.gz -2 r2.fq.gz | \
    samtools sort -@ 4 -o aligned.sorted.bam -

samtools index aligned.sorted.bam

Stranded Libraries

# Forward stranded (e.g., Ligation)
hisat2 -p 8 -x hisat2_index \
    --rna-strandness FR \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

# Reverse stranded (e.g., dUTP, TruSeq - most common)
hisat2 -p 8 -x hisat2_index \
    --rna-strandness RF \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

# Single-end stranded
hisat2 -p 8 -x hisat2_index \
    --rna-strandness F \    # or R for reverse
    -U reads.fq.gz -S aligned.sam

Novel Splice Junction Discovery

# Output novel splice junctions
hisat2 -p 8 -x hisat2_index \
    --novel-splicesite-outfile novel_splices.txt \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

# Use known + novel junctions for subsequent alignments
hisat2 -p 8 -x hisat2_index \
    --novel-splicesite-infile novel_splices.txt \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Two-Pass Alignment (Manual)

# Pass 1: Discover junctions from all samples
for r1 in *_R1.fq.gz; do
    r2=${r1/_R1/_R2}
    base=$(basename $r1 _R1.fq.gz)
    hisat2 -p 8 -x hisat2_index \
        --novel-splicesite-outfile ${base}_splices.txt \
        -1 $r1 -2 $r2 -S /dev/null
done

# Combine and filter junctions
cat *_splices.txt | sort -u > combined_splices.txt

# Pass 2: Realign with all junctions
for r1 in *_R1.fq.gz; do
    r2=${r1/_R1/_R2}
    base=$(basename $r1 _R1.fq.gz)
    hisat2 -p 8 -x hisat2_index \
        --novel-splicesite-infile combined_splices.txt \
        -1 $r1 -2 $r2 | \
        samtools sort -@ 4 -o ${base}.sorted.bam -
done

Read Group Information

hisat2 -p 8 -x hisat2_index \
    --rg-id sample1 \
    --rg SM:sample1 \
    --rg PL:ILLUMINA \
    --rg LB:lib1 \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Downstream Quantification

# Output name-sorted BAM for htseq-count
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
    samtools sort -n -@ 4 -o aligned.namesorted.bam -

# Or coordinate-sorted for featureCounts
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
    samtools sort -@ 4 -o aligned.sorted.bam -

Key Parameters

Parameter Default Description
-p 1 Number of threads
-x - Index basename
--rna-strandness unstranded FR/RF/F/R
--dta off Downstream transcriptome assembly
--dta-cufflinks off For Cufflinks
--min-intronlen 20 Minimum intron length
--max-intronlen 500000 Maximum intron length
-k 5 Max alignments to report

For StringTie/Cufflinks

# Use --dta for StringTie
hisat2 -p 8 -x hisat2_index \
    --dta \
    -1 r1.fq.gz -2 r2.fq.gz | \
    samtools sort -@ 4 -o aligned.sorted.bam -

Alignment Summary

# HISAT2 prints summary to stderr
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam 2> summary.txt

Example:

50000000 reads; of these:
  50000000 (100.00%) were paired; of these:
    2500000 (5.00%) aligned concordantly 0 times
    45000000 (90.00%) aligned concordantly exactly 1 time
    2500000 (5.00%) aligned concordantly >1 times
95.00% overall alignment rate

Memory Comparison

Aligner Human Genome Memory
STAR ~30GB
HISAT2 ~8GB

Related Skills

  • read-alignment/star-alignment - Alternative with more features
  • rna-quantification/featurecounts-counting - Count aligned reads
  • rna-quantification/alignment-free-quant - Skip alignment entirely
  • differential-expression/deseq2-basics - Downstream DE analysis
Weekly Installs
3
GitHub Stars
349
First Seen
Jan 24, 2026
Installed on
trae2
windsurf1
opencode1
codex1
claude-code1
antigravity1