bio-read-alignment-hisat2-alignment
HISAT2 RNA-seq Alignment
Build Index
# Basic index (no annotation)
hisat2-build -p 8 reference.fa hisat2_index
# Index with splice sites and exons (recommended)
hisat2_extract_splice_sites.py annotation.gtf > splice_sites.txt
hisat2_extract_exons.py annotation.gtf > exons.txt
hisat2-build -p 8 \
--ss splice_sites.txt \
--exon exons.txt \
reference.fa hisat2_index
Basic Alignment
# Paired-end reads
hisat2 -p 8 -x hisat2_index \
-1 reads_1.fq.gz -2 reads_2.fq.gz \
-S aligned.sam
# Single-end reads
hisat2 -p 8 -x hisat2_index \
-U reads.fq.gz \
-S aligned.sam
Direct to Sorted BAM
# Pipe to samtools
hisat2 -p 8 -x hisat2_index \
-1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
samtools index aligned.sorted.bam
Stranded Libraries
# Forward stranded (e.g., Ligation)
hisat2 -p 8 -x hisat2_index \
--rna-strandness FR \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Reverse stranded (e.g., dUTP, TruSeq - most common)
hisat2 -p 8 -x hisat2_index \
--rna-strandness RF \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Single-end stranded
hisat2 -p 8 -x hisat2_index \
--rna-strandness F \ # or R for reverse
-U reads.fq.gz -S aligned.sam
Novel Splice Junction Discovery
# Output novel splice junctions
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-outfile novel_splices.txt \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Use known + novel junctions for subsequent alignments
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-infile novel_splices.txt \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
Two-Pass Alignment (Manual)
# Pass 1: Discover junctions from all samples
for r1 in *_R1.fq.gz; do
r2=${r1/_R1/_R2}
base=$(basename $r1 _R1.fq.gz)
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-outfile ${base}_splices.txt \
-1 $r1 -2 $r2 -S /dev/null
done
# Combine and filter junctions
cat *_splices.txt | sort -u > combined_splices.txt
# Pass 2: Realign with all junctions
for r1 in *_R1.fq.gz; do
r2=${r1/_R1/_R2}
base=$(basename $r1 _R1.fq.gz)
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-infile combined_splices.txt \
-1 $r1 -2 $r2 | \
samtools sort -@ 4 -o ${base}.sorted.bam -
done
Read Group Information
hisat2 -p 8 -x hisat2_index \
--rg-id sample1 \
--rg SM:sample1 \
--rg PL:ILLUMINA \
--rg LB:lib1 \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
Downstream Quantification
# Output name-sorted BAM for htseq-count
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -n -@ 4 -o aligned.namesorted.bam -
# Or coordinate-sorted for featureCounts
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
Key Parameters
| Parameter | Default | Description |
|---|---|---|
| -p | 1 | Number of threads |
| -x | - | Index basename |
| --rna-strandness | unstranded | FR/RF/F/R |
| --dta | off | Downstream transcriptome assembly |
| --dta-cufflinks | off | For Cufflinks |
| --min-intronlen | 20 | Minimum intron length |
| --max-intronlen | 500000 | Maximum intron length |
| -k | 5 | Max alignments to report |
For StringTie/Cufflinks
# Use --dta for StringTie
hisat2 -p 8 -x hisat2_index \
--dta \
-1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
Alignment Summary
# HISAT2 prints summary to stderr
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam 2> summary.txt
Example:
50000000 reads; of these:
50000000 (100.00%) were paired; of these:
2500000 (5.00%) aligned concordantly 0 times
45000000 (90.00%) aligned concordantly exactly 1 time
2500000 (5.00%) aligned concordantly >1 times
95.00% overall alignment rate
Memory Comparison
| Aligner | Human Genome Memory |
|---|---|
| STAR | ~30GB |
| HISAT2 | ~8GB |
Related Skills
- read-alignment/star-alignment - Alternative with more features
- rna-quantification/featurecounts-counting - Count aligned reads
- rna-quantification/alignment-free-quant - Skip alignment entirely
- differential-expression/deseq2-basics - Downstream DE analysis
More from gptomics/bioskills
bioskills
Installs 425 bioinformatics skills covering sequence analysis, RNA-seq, single-cell, variant calling, metagenomics, structural biology, and 56 more categories. Use when setting up bioinformatics capabilities or when a bioinformatics task requires specialized skills not yet installed.
101bio-single-cell-batch-integration
Integrate multiple scRNA-seq samples/batches using Harmony, scVI, Seurat anchors, and fastMNN. Remove technical variation while preserving biological differences. Use when integrating multiple scRNA-seq batches or datasets.
5bio-epitranscriptomics-merip-preprocessing
Align and QC MeRIP-seq IP and input samples for m6A analysis. Use when preparing MeRIP-seq data for peak calling or differential methylation analysis.
5bio-data-visualization-multipanel-figures
Combine multiple plots into publication-ready multi-panel figures using patchwork, cowplot, or matplotlib GridSpec with shared legends and panel labels. Use when combining multiple plots into publication figures.
5bio-data-visualization-specialized-omics-plots
Reusable plotting functions for common omics visualizations. Custom ggplot2/matplotlib implementations of volcano, MA, PCA, enrichment dotplots, boxplots, and survival curves. Use when creating volcano, MA, or enrichment plots.
5bio-read-qc-fastp-workflow
All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.
5