bio-workflows-methylation-pipeline
SKILL.md
Methylation Pipeline
Complete workflow from bisulfite sequencing FASTQ to differentially methylated regions.
Workflow Overview
FASTQ files
|
v
[1. QC & Trimming] -----> fastp/Trim Galore
|
v
[2. Alignment] ---------> Bismark
|
v
[3. Deduplication] -----> deduplicate_bismark
|
v
[4. Methylation Calling] -> bismark_methylation_extractor
|
v
[5. Analysis] -----------> methylKit (R)
|
v
[6. DMR Detection] ------> methylKit/DSS
|
v
Differentially methylated regions
Primary Path: Bismark + methylKit
Step 1: Quality Control
# Trim Galore recommended for bisulfite data (handles adapter bias)
trim_galore --paired --fastqc \
-o trimmed/ \
sample_R1.fastq.gz sample_R2.fastq.gz
# Or fastp with conservative settings
fastp -i sample_R1.fastq.gz -I sample_R2.fastq.gz \
-o trimmed/sample_R1.fq.gz -O trimmed/sample_R2.fq.gz \
--detect_adapter_for_pe \
--qualified_quality_phred 20 \
--length_required 35 \
--html qc/sample_fastp.html
Step 2: Bismark Alignment
# Prepare genome (once)
bismark_genome_preparation --bowtie2 genome/
# Align
bismark --genome genome/ \
-1 trimmed/sample_R1_val_1.fq.gz \
-2 trimmed/sample_R2_val_2.fq.gz \
-o aligned/ \
--parallel 4 \
--temp_dir tmp/
# Output: sample_R1_val_1_bismark_bt2_pe.bam
QC Checkpoint: Check Bismark report
- Mapping efficiency >50% (BS-seq has lower rates)
- Bisulfite conversion rate >99%
Step 3: Deduplication
deduplicate_bismark \
--bam \
-p \
-o deduplicated/ \
aligned/sample_R1_val_1_bismark_bt2_pe.bam
Step 4: Methylation Calling
bismark_methylation_extractor \
--paired-end \
--comprehensive \
--bedGraph \
--cytosine_report \
--genome_folder genome/ \
-o methylation/ \
deduplicated/sample_R1_val_1_bismark_bt2_pe.deduplicated.bam
# Generate summary report
bismark2report
bismark2summary
Step 5: Analysis with methylKit
library(methylKit)
# Read methylation calls
files <- list(
'methylation/control_1.CpG_report.txt',
'methylation/control_2.CpG_report.txt',
'methylation/treated_1.CpG_report.txt',
'methylation/treated_2.CpG_report.txt'
)
sample_ids <- c('control_1', 'control_2', 'treated_1', 'treated_2')
treatment <- c(0, 0, 1, 1)
# Read cytosine reports
meth_obj <- methRead(
location = as.list(files),
sample.id = as.list(sample_ids),
assembly = 'hg38',
treatment = treatment,
context = 'CpG',
pipeline = 'bismarkCytosineReport'
)
# Filter by coverage
meth_filtered <- filterByCoverage(meth_obj, lo.count = 10, hi.perc = 99.9)
# Normalize coverage
meth_norm <- normalizeCoverage(meth_filtered)
# Merge samples (keep sites covered in all)
meth_merged <- unite(meth_norm, destrand = TRUE)
# Sample statistics
getMethylationStats(meth_obj[[1]], plot = TRUE)
getCoverageStats(meth_obj[[1]], plot = TRUE)
Step 6: DMR Detection
# Calculate differential methylation (per CpG)
diff_meth <- calculateDiffMeth(meth_merged)
# Get significant DMCs
dmc <- getMethylDiff(diff_meth, difference = 25, qvalue = 0.01)
# Tile into regions (DMRs)
tiles <- tileMethylCounts(meth_merged, win.size = 1000, step.size = 1000)
diff_tiles <- calculateDiffMeth(tiles)
dmr <- getMethylDiff(diff_tiles, difference = 25, qvalue = 0.01)
# Export
write.csv(as.data.frame(dmc), 'dmc_results.csv')
write.csv(as.data.frame(dmr), 'dmr_results.csv')
# Annotate with genomic features
library(genomation)
gene_obj <- readTranscriptFeatures('genes.bed')
annotateWithGeneParts(as(dmr, 'GRanges'), gene_obj)
Parameter Recommendations
| Step | Parameter | Value |
|---|---|---|
| Trim Galore | default | Recommended for BS-seq |
| Bismark | --parallel | 4 (per sample parallelization) |
| methylKit | lo.count | 10 (minimum coverage) |
| methylKit | difference | 25 (% methylation difference) |
| methylKit | qvalue | 0.01 |
| DMR tiles | win.size | 500-1000 bp |
Troubleshooting
| Issue | Likely Cause | Solution |
|---|---|---|
| Low mapping rate | Normal for BS-seq | Expect 40-70% |
| Low conversion | Failed bisulfite treatment | Check spike-in controls |
| Few DMRs | Low coverage, small differences | Increase sequencing, relax thresholds |
| Biased positions | M-bias | Trim 10bp from read ends |
Complete Pipeline Script
#!/bin/bash
set -e
THREADS=4
GENOME="genome/"
SAMPLES="control_1 control_2 treated_1 treated_2"
OUTDIR="methylation_results"
mkdir -p ${OUTDIR}/{trimmed,aligned,deduplicated,methylation,qc}
# Step 1: QC
for sample in $SAMPLES; do
trim_galore --paired --fastqc -o ${OUTDIR}/trimmed/ \
${sample}_R1.fastq.gz ${sample}_R2.fastq.gz
done
# Step 2: Alignment
for sample in $SAMPLES; do
bismark --genome ${GENOME} \
-1 ${OUTDIR}/trimmed/${sample}_R1_val_1.fq.gz \
-2 ${OUTDIR}/trimmed/${sample}_R2_val_2.fq.gz \
-o ${OUTDIR}/aligned/ \
--parallel ${THREADS} --temp_dir tmp/
done
# Step 3: Deduplication
for sample in $SAMPLES; do
deduplicate_bismark --bam -p \
-o ${OUTDIR}/deduplicated/ \
${OUTDIR}/aligned/${sample}_R1_val_1_bismark_bt2_pe.bam
done
# Step 4: Methylation calling
for sample in $SAMPLES; do
bismark_methylation_extractor --paired-end --comprehensive \
--bedGraph --cytosine_report \
--genome_folder ${GENOME} \
-o ${OUTDIR}/methylation/ \
${OUTDIR}/deduplicated/${sample}_R1_val_1_bismark_bt2_pe.deduplicated.bam
done
bismark2report
echo "Pipeline complete. Run R script for DMR analysis."
Related Skills
- methylation-analysis/bismark-alignment - Bismark parameters
- methylation-analysis/methylation-calling - Calling details
- methylation-analysis/methylkit-analysis - methylKit functions
- methylation-analysis/dmr-detection - DMR algorithms
Weekly Installs
3
Repository
gptomics/bioskillsGitHub Stars
339
First Seen
Jan 24, 2026
Security Audits
Installed on
trae2
windsurf1
opencode1
codex1
claude-code1
antigravity1