bio-clip-seq-clip-peak-calling
SKILL.md
CLIP-seq Peak Calling
CLIPper (Recommended)
# Basic peak calling
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--save-pickle
# With FDR threshold
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--FDR 0.05 \
--superlocal
# Specify gene annotations
clipper \
-b deduped.bam \
-s hg38 \
--gene genes.bed \
-o peaks.bed
CLIPper Options
| Option | Description |
|---|---|
| -b | Input BAM file |
| -s | Species (hg38, mm10) |
| -o | Output BED file |
| --FDR | FDR threshold (default 0.05) |
| --superlocal | Use superlocal background |
| --gene | Custom gene annotation BED |
| --save-pickle | Save intermediate data |
PureCLIP (HMM-Based)
PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.
# Installation
conda install -c bioconda pureclip
# Basic peak calling
pureclip \
-i deduped.bam \
-bai deduped.bam.bai \
-g genome.fa \
-o crosslink_sites.bed \
-or binding_regions.bed \
-nt 4
# -nt 4: Number of threads. Adjust based on CPU cores.
# -o: Single-nucleotide crosslink sites
# -or: Broader binding regions
PureCLIP Options
| Option | Description |
|---|---|
| -i | Input BAM file |
| -bai | BAM index file |
| -g | Reference genome FASTA |
| -o | Crosslink sites output |
| -or | Binding regions output |
| -nt | Number of threads |
| -iv | Interval file to restrict analysis |
| -dm | Min distance for merging |
PureCLIP with Input Control
# With SMInput control BAM
pureclip \
-i clip.bam \
-bai clip.bam.bai \
-g genome.fa \
-ibam sminput.bam \
-ibai sminput.bam.bai \
-o crosslinks.bed \
-or regions.bed \
-nt 8
# -ibam/-ibai: Input control BAM for background modeling
PureCLIP Output
# Crosslink sites BED contains:
# chr start end name score strand
# Score interpretation:
# Higher scores = more confident crosslink sites
# Filter by score
# score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed
PureCLIP for Different CLIP Types
# eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4 -dm 8
# iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
# PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
Piranha
# Basic usage
Piranha -s deduped.bam -o peaks.bed
# With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01
# Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u
# Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial
PEAKachu (for PAR-CLIP)
# PAR-CLIP specific caller
peakachu adaptive \
-c control.bam \
-t treatment.bam \
-r reference.fa \
-o peakachu_peaks.gff
MACS3 for CLIP (Alternative)
# Use narrow peak calling mode
macs3 callpeak \
-t deduped.bam \
-f BAM \
-g hs \
-n clip_peaks \
--nomodel \
--extsize 50 \
-q 0.01
Strand-Specific Peak Calling
# Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam
samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam
# Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed
clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed
# Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed
Filter Peaks
# By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed
# By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed
# By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed
Merge Replicates
# Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \
sort -u > consensus_peaks.bed
# Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \
awk '$4 >= 2' | \
bedtools merge > consensus_peaks.bed
Peak Metrics
import pandas as pd
def load_clip_peaks(bed_path):
peaks = pd.read_csv(bed_path, sep='\t', header=None,
names=['chrom', 'start', 'end', 'name', 'score', 'strand'])
return peaks
def peak_stats(peaks):
stats = {
'n_peaks': len(peaks),
'mean_width': (peaks['end'] - peaks['start']).mean(),
'median_score': peaks['score'].median(),
'peaks_per_chrom': peaks.groupby('chrom').size().to_dict()
}
return stats
peaks = load_clip_peaks('peaks.bed')
print(peak_stats(peaks))
Quality Metrics
| Metric | Good Value | Description |
|---|---|---|
| Peak count | 1,000-50,000 | Depends on RBP |
| Peak width | 20-100 nt | Typical for RBP footprint |
| FRiP | >0.1 | Fraction reads in peaks |
Calculate FRiP
# Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)
# Total reads
total_reads=$(samtools view -c deduped.bam)
# FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc)
echo "FRiP: $frip"
Related Skills
- clip-alignment - Generate aligned BAM
- clip-preprocessing - UMI deduplication
- binding-site-annotation - Annotate peaks with gene features
- clip-motif-analysis - Find enriched motifs in peaks
Weekly Installs
3
Repository
gptomics/bioskillsInstalled on
windsurf2
trae2
opencode2
codex2
claude-code2
antigravity2