bio-bedgraph-handling
SKILL.md
bedGraph Handling
bedGraph is a text format for displaying continuous-valued data on genome browsers. Common for coverage, signal intensity, and scores.
bedGraph Format
track type=bedGraph name="Sample" description="Coverage"
chr1 0 100 1.5
chr1 100 200 2.3
chr1 200 300 0.8
Four columns: chrom, start, end, value (0-based, half-open)
Create bedGraph from BAM
Using bedtools genomecov
bedtools genomecov -ibam sample.bam -bg > sample.bedgraph
bedtools genomecov -ibam sample.bam -bg -split > sample.bedgraph
bedtools genomecov -ibam sample.bam -bg -scale 1.5 > sample.scaled.bedgraph
Strand-Specific
bedtools genomecov -ibam sample.bam -bg -strand + > sample.plus.bedgraph
bedtools genomecov -ibam sample.bam -bg -strand - > sample.minus.bedgraph
5' End Coverage (ChIP-seq)
bedtools genomecov -ibam sample.bam -bg -5 > sample.5prime.bedgraph
Normalize by Library Size (CPM)
total_reads=$(samtools view -c -F 260 sample.bam)
scale=$(echo "scale=10; 1000000 / $total_reads" | bc)
bedtools genomecov -ibam sample.bam -bg -scale $scale > sample.cpm.bedgraph
Sort bedGraph
bedGraph must be sorted for conversion to bigWig.
sort -k1,1 -k2,2n sample.bedgraph > sample.sorted.bedgraph
LC_ALL=C sort -k1,1 -k2,2n sample.bedgraph > sample.sorted.bedgraph
Convert bedGraph to bigWig
Using UCSC bedGraphToBigWig
bedGraphToBigWig sample.sorted.bedgraph chrom.sizes sample.bw
fetchChromSizes hg38 > hg38.chrom.sizes
bedGraphToBigWig sample.sorted.bedgraph hg38.chrom.sizes sample.bw
Generate chrom.sizes
samtools faidx reference.fa
cut -f1,2 reference.fa.fai > chrom.sizes
fetchChromSizes hg38 > hg38.chrom.sizes
mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -e \
"select chrom, size from hg38.chromInfo" > hg38.chrom.sizes
Clip to Chromosome Boundaries
bedClip sample.bedgraph chrom.sizes sample.clipped.bedgraph
bedGraphToBigWig sample.clipped.bedgraph chrom.sizes sample.bw
Convert bigWig to bedGraph
bigWigToBedGraph sample.bw sample.bedgraph
bigWigToBedGraph sample.bw sample.chr1.bedgraph -chrom=chr1
bigWigToBedGraph sample.bw sample.region.bedgraph -chrom=chr1 -start=1000 -end=2000
Merge bedGraph Files
Using bedtools unionbedg
bedtools unionbedg -i sample1.bedgraph sample2.bedgraph sample3.bedgraph \
-header -names sample1 sample2 sample3 > merged.bedgraph
Average Across Samples
bedtools unionbedg -i sample1.bedgraph sample2.bedgraph sample3.bedgraph | \
awk '{sum=0; for(i=4;i<=NF;i++) sum+=$i; print $1,$2,$3,sum/(NF-3)}' OFS='\t' \
> average.bedgraph
Mathematical Operations
bedtools map for Region Statistics
bedtools map -a regions.bed -b sample.bedgraph -c 4 -o mean > region_means.bed
bedtools map -a regions.bed -b sample.bedgraph -c 4 -o sum > region_sums.bed
bedtools map -a regions.bed -b sample.bedgraph -c 4 -o max > region_max.bed
Subtract Background
bedtools unionbedg -i treatment.bedgraph input.bedgraph | \
awk '{diff=$4-$5; if(diff<0) diff=0; print $1,$2,$3,diff}' OFS='\t' \
> subtracted.bedgraph
Log Transform
awk '{print $1,$2,$3,log($4+1)/log(2)}' OFS='\t' sample.bedgraph > sample.log2.bedgraph
Smooth Signal
bedtools slop -i sample.bedgraph -g chrom.sizes -b 50 | \
bedtools merge -i - -c 4 -o mean > smoothed.bedgraph
Python with pyBigWig
Write bedGraph
import pyBigWig
bw = pyBigWig.open('output.bedgraph', 'w')
bw.addHeader([('chr1', 248956422), ('chr2', 242193529)])
chroms = ['chr1', 'chr1', 'chr1']
starts = [0, 100, 200]
ends = [100, 200, 300]
values = [1.5, 2.3, 0.8]
bw.addEntries(chroms, starts, ends=ends, values=values)
bw.close()
Read bigWig to bedGraph Format
import pyBigWig
bw = pyBigWig.open('sample.bw')
for chrom, size in bw.chroms().items():
intervals = bw.intervals(chrom)
if intervals:
for start, end, value in intervals:
print(f'{chrom}\t{start}\t{end}\t{value}')
bw.close()
Convert bigWig Region to bedGraph
import pyBigWig
bw = pyBigWig.open('sample.bw')
intervals = bw.intervals('chr1', 1000000, 2000000)
with open('region.bedgraph', 'w') as f:
for start, end, value in intervals:
f.write(f'chr1\t{start}\t{end}\t{value}\n')
bw.close()
deepTools for Normalization
bamCoverage (BAM to bedGraph/bigWig)
bamCoverage -b sample.bam -o sample.bw --normalizeUsing RPKM
bamCoverage -b sample.bam -o sample.bw --normalizeUsing CPM
bamCoverage -b sample.bam -o sample.bw --normalizeUsing BPM
bamCoverage -b sample.bam -o sample.bedgraph --outFileFormat bedgraph --normalizeUsing CPM
bamCompare (Treatment vs Control)
bamCompare -b1 treatment.bam -b2 input.bam -o log2ratio.bw --scaleFactorsMethod readCount
bamCompare -b1 treatment.bam -b2 input.bam -o subtracted.bw --ratio subtract
bigwigCompare
bigwigCompare -b1 treatment.bw -b2 input.bw -o ratio.bw --ratio log2
bigwigCompare -b1 sample1.bw -b2 sample2.bw -o diff.bw --ratio subtract
Filtering and Subsetting
Filter by Value
awk '$4 >= 1.0' sample.bedgraph > high_signal.bedgraph
awk '$4 > 0' sample.bedgraph > nonzero.bedgraph
Extract Regions
bedtools intersect -a sample.bedgraph -b regions.bed > subset.bedgraph
Remove Specific Chromosomes
grep -v "^chrM" sample.bedgraph | grep -v "_random" > filtered.bedgraph
awk '$1 ~ /^chr[0-9XY]+$/' sample.bedgraph > standard_chroms.bedgraph
Aggregate to Bins
Fixed-Size Bins
bedtools makewindows -g chrom.sizes -w 1000 > bins.bed
bedtools map -a bins.bed -b sample.bedgraph -c 4 -o mean > binned.bedgraph
Gene Bodies
bedtools map -a genes.bed -b sample.bedgraph -c 4 -o mean > gene_signal.bed
Quality Control
Check for Overlapping Intervals
bedtools merge -i sample.bedgraph -c 4 -o collapse | \
awk 'index($4,",") > 0' | head
Verify Sorted Order
sort -c -k1,1 -k2,2n sample.bedgraph && echo "Sorted" || echo "Not sorted"
Check Value Range
awk 'NR==1 {min=$4; max=$4} {if($4<min) min=$4; if($4>max) max=$4}
END {print "Min:", min, "Max:", max}' sample.bedgraph
Complete Pipeline
#!/bin/bash
BAM=$1
NAME=$(basename $BAM .bam)
CHROM_SIZES=$2
total_reads=$(samtools view -c -F 260 $BAM)
scale=$(echo "scale=10; 1000000 / $total_reads" | bc)
bedtools genomecov -ibam $BAM -bg -scale $scale > ${NAME}.bedgraph
sort -k1,1 -k2,2n ${NAME}.bedgraph > ${NAME}.sorted.bedgraph
bedClip ${NAME}.sorted.bedgraph $CHROM_SIZES ${NAME}.clipped.bedgraph
bedGraphToBigWig ${NAME}.clipped.bedgraph $CHROM_SIZES ${NAME}.bw
rm ${NAME}.bedgraph ${NAME}.sorted.bedgraph ${NAME}.clipped.bedgraph
echo "Created ${NAME}.bw (CPM normalized)"
Track Header for UCSC
echo 'track type=bedGraph name="Sample" description="CPM normalized" visibility=full color=0,0,255 altColor=255,0,0 autoScale=on graphType=bar' > track.bedgraph
cat sample.bedgraph >> track.bedgraph
Related Skills
- coverage-analysis - Generate coverage from alignments
- bigwig-tracks - Work with bigWig format
- chipseq-visualization - Visualize signal tracks
- alignment-files - BAM file processing
Weekly Installs
3
Repository
gptomics/bioskillsInstalled on
windsurf2
trae2
opencode2
codex2
claude-code2
antigravity2