bio-methylation-bismark-alignment
Bismark Alignment
Prepare Genome Index
# One-time genome preparation (creates bisulfite-converted index)
bismark_genome_preparation --bowtie2 /path/to/genome_folder/
# Genome folder should contain FASTA files (e.g., hg38.fa, chr1.fa, etc.)
# Creates Bisulfite_Genome/ subdirectory with CT and GA converted indices
Basic Single-End Alignment
bismark --genome /path/to/genome_folder/ reads.fastq.gz -o output_dir/
Paired-End Alignment
bismark --genome /path/to/genome_folder/ \
-1 reads_R1.fastq.gz \
-2 reads_R2.fastq.gz \
-o output_dir/
Common Options
bismark --genome /path/to/genome_folder/ \
--bowtie2 \ # Use bowtie2 (default)
--parallel 4 \ # Number of parallel instances
--temp_dir /tmp/ \ # Temporary directory
--non_directional \ # For non-directional libraries
--nucleotide_coverage \ # Generate nucleotide coverage report
-o output_dir/ \
reads.fastq.gz
RRBS Mode
# Reduced Representation Bisulfite Sequencing
bismark --genome /path/to/genome_folder/ \
--pbat \ # For PBAT libraries (post-bisulfite adapter tagging)
reads.fastq.gz
# MspI digestion (RRBS standard)
# Bismark handles MspI-digested libraries automatically
PBAT Libraries
# Post-Bisulfite Adapter Tagging (e.g., scBS-seq)
bismark --genome /path/to/genome_folder/ --pbat reads.fastq.gz
Non-Directional Libraries
# For libraries where all 4 strands are present
bismark --genome /path/to/genome_folder/ --non_directional reads.fastq.gz
With Quality/Adapter Trimming (Pre-alignment)
# Trim adapters first with Trim Galore (recommended)
trim_galore --illumina --paired reads_R1.fastq.gz reads_R2.fastq.gz
# Then align
bismark --genome /path/to/genome_folder/ \
-1 reads_R1_val_1.fq.gz \
-2 reads_R2_val_2.fq.gz
Multicore Processing
# --parallel sets instances per alignment direction
# Total threads = parallel * 2 (for directional) or parallel * 4 (non-directional)
bismark --genome /path/to/genome_folder/ \
--parallel 4 \
reads.fastq.gz
Output Files
# Bismark produces:
# - reads_bismark_bt2.bam # Aligned reads
# - reads_bismark_bt2_SE_report.txt # Alignment report
# View alignment report
cat output_dir/reads_bismark_bt2_SE_report.txt
Sort and Index BAM
# Bismark output is unsorted
samtools sort output.bam -o output.sorted.bam
samtools index output.sorted.bam
Deduplicate (Optional)
# Remove PCR duplicates (recommended for WGBS, not RRBS)
deduplicate_bismark --bam output_bismark_bt2.bam
# For paired-end
deduplicate_bismark --paired --bam output_bismark_bt2_pe.bam
Check Alignment Statistics
# Bismark generates detailed report
cat *_SE_report.txt
# Key metrics:
# - Sequences analyzed
# - Unique alignments
# - Mapping efficiency
# - C methylated in CpG context
Genome Preparation with HISAT2 (Recommended for Large Genomes)
# HISAT2 is faster and uses less memory for large mammalian genomes
bismark_genome_preparation --hisat2 /path/to/genome_folder/
# Align with HISAT2
bismark --genome /path/to/genome_folder/ --hisat2 reads.fastq.gz
# HISAT2 paired-end
bismark --genome /path/to/genome_folder/ --hisat2 \
-1 reads_R1.fastq.gz \
-2 reads_R2.fastq.gz
Key Parameters
| Parameter | Description |
|---|---|
| --genome | Path to genome folder |
| --bowtie2 | Use Bowtie2 aligner (default) |
| --hisat2 | Use HISAT2 aligner |
| --parallel | Parallel alignment instances |
| --non_directional | Non-directional library |
| --pbat | PBAT library protocol |
| -o | Output directory |
| --temp_dir | Temporary file directory |
| --nucleotide_coverage | Generate nuc coverage report |
| -N | Mismatches in seed (0 or 1, default 0) |
| -L | Seed length (default 20) |
Library Types
| Type | Parameter | Description |
|---|---|---|
| Directional | (default) | Standard WGBS/RRBS |
| Non-directional | --non_directional | All 4 strands |
| PBAT | --pbat | Post-bisulfite adapter tagging |
Related Skills
- methylation-calling - Extract methylation from Bismark BAM
- methylkit-analysis - Import Bismark output to R
- sequence-io/read-sequences - FASTQ handling
- alignment-files/sam-bam-basics - BAM manipulation
More from gptomics/bioskills
bioskills
Installs 425 bioinformatics skills covering sequence analysis, RNA-seq, single-cell, variant calling, metagenomics, structural biology, and 56 more categories. Use when setting up bioinformatics capabilities or when a bioinformatics task requires specialized skills not yet installed.
101bio-single-cell-batch-integration
Integrate multiple scRNA-seq samples/batches using Harmony, scVI, Seurat anchors, and fastMNN. Remove technical variation while preserving biological differences. Use when integrating multiple scRNA-seq batches or datasets.
5bio-epitranscriptomics-merip-preprocessing
Align and QC MeRIP-seq IP and input samples for m6A analysis. Use when preparing MeRIP-seq data for peak calling or differential methylation analysis.
5bio-data-visualization-multipanel-figures
Combine multiple plots into publication-ready multi-panel figures using patchwork, cowplot, or matplotlib GridSpec with shared legends and panel labels. Use when combining multiple plots into publication figures.
5bio-data-visualization-specialized-omics-plots
Reusable plotting functions for common omics visualizations. Custom ggplot2/matplotlib implementations of volcano, MA, PCA, enrichment dotplots, boxplots, and survival curves. Use when creating volcano, MA, or enrichment plots.
5bio-read-qc-fastp-workflow
All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.
5