bio-atac-seq-footprinting
SKILL.md
TF Footprinting
TOBIAS Workflow
# 1. Correct Tn5 bias
tobias ATACorrect \
--bam sample.bam \
--genome genome.fa \
--peaks peaks.bed \
--outdir corrected/ \
--cores 8
# 2. Calculate footprint scores
tobias FootprintScores \
--signal corrected/sample_corrected.bw \
--regions peaks.bed \
--output footprints.bw \
--cores 8
# 3. Bind TF motifs
tobias BINDetect \
--motifs JASPAR_motifs.pfm \
--signals footprints.bw \
--genome genome.fa \
--peaks peaks.bed \
--outdir bindetect_output/ \
--cores 8
TOBIAS Differential Footprinting
# Compare conditions
tobias BINDetect \
--motifs JASPAR_motifs.pfm \
--signals condition1.bw condition2.bw \
--genome genome.fa \
--peaks consensus_peaks.bed \
--outdir differential_footprints/ \
--cond_names condition1 condition2 \
--cores 8
# Output includes:
# - Differential binding scores
# - Per-TF statistics
# - Bound/unbound site predictions
Download JASPAR Motifs
# Download JASPAR motifs
wget https://jaspar.genereg.net/download/data/2022/CORE/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt
mv JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt JASPAR_motifs.pfm
Prepare Input Files
# Ensure BAM is sorted and indexed
samtools sort -@ 8 sample.bam -o sample.sorted.bam
samtools index sample.sorted.bam
# Filter peaks (remove blacklist, size filter)
bedtools intersect -v -a peaks.narrowPeak -b blacklist.bed | \
awk '$3-$2 >= 100 && $3-$2 <= 5000' > filtered_peaks.bed
HINT-ATAC Alternative
# RGT suite HINT-ATAC
rgt-hint footprinting \
--atac-seq \
--organism hg38 \
--output-prefix sample \
sample.bam peaks.bed
PIQ Footprinting
# PIQ (another footprinting tool)
library(PIQ)
# Load data
bam <- 'sample.bam'
pwms <- readMotifs('JASPAR_motifs.pfm')
# Run footprinting
piq_results <- piq(bam, pwms, genome='hg38')
Aggregate Footprint Plots
# TOBIAS PlotAggregate
tobias PlotAggregate \
--TFBS bindetect_output/*/beds/*_bound.bed \
--signals corrected/sample_corrected.bw \
--output aggregate_footprints.pdf \
--share_y \
--plot_boundaries
Python: Custom Footprint Analysis
import pyBigWig
import numpy as np
import pandas as pd
from pyfaidx import Fasta
def extract_footprint_signal(bigwig_file, bed_file, flank=100):
'''Extract signal around binding sites.'''
bw = pyBigWig.open(bigwig_file)
signals = []
for line in open(bed_file):
fields = line.strip().split('\t')
chrom, start, end = fields[0], int(fields[1]), int(fields[2])
center = (start + end) // 2
try:
vals = bw.values(chrom, center - flank, center + flank)
if vals:
signals.append(vals)
except:
continue
avg_signal = np.nanmean(signals, axis=0)
return avg_signal
def plot_footprint(signal, output_file):
'''Plot aggregate footprint.'''
import matplotlib.pyplot as plt
x = np.arange(-len(signal)//2, len(signal)//2)
plt.figure(figsize=(8, 4))
plt.plot(x, signal, 'b-', linewidth=2)
plt.axvline(0, color='red', linestyle='--', alpha=0.5)
plt.xlabel('Distance from motif center (bp)')
plt.ylabel('ATAC-seq signal')
plt.title('Aggregate Footprint')
plt.savefig(output_file, dpi=150)
plt.close()
Scan for Motifs
# Find motif occurrences in peaks
# Using FIMO (MEME suite)
fimo --oc fimo_output motifs.meme peaks.fa
# Or HOMER
findMotifsGenome.pl peaks.bed hg38 motif_analysis/ -find motif.motif
Interpret Footprint Depth
| Footprint Depth | Interpretation |
|---|---|
| Deep footprint | Strong TF binding |
| Shallow footprint | Weak/transient binding |
| No footprint | No binding or wrong motif |
| Shoulders only | Nucleosome positioning |
Quality Considerations
# Footprinting requires:
# - High read depth (>50M reads)
# - NFR-enriched signal (filter for <100bp fragments)
# - Good Tn5 bias correction
# Extract NFR reads
samtools view -h sample.bam | \
awk 'substr($0,1,1)=="@" || ($9>0 && $9<100) || ($9<0 && $9>-100)' | \
samtools view -b > nfr.bam
Differential TF Activity
def compare_footprints(tf_name, cond1_bw, cond2_bw, motif_bed):
'''Compare TF footprints between conditions.'''
sig1 = extract_footprint_signal(cond1_bw, motif_bed)
sig2 = extract_footprint_signal(cond2_bw, motif_bed)
# Calculate footprint depth
depth1 = np.nanmean(sig1[:30]) - np.nanmin(sig1[40:60])
depth2 = np.nanmean(sig2[:30]) - np.nanmin(sig2[40:60])
diff = depth2 - depth1
return {
'TF': tf_name,
'depth_cond1': depth1,
'depth_cond2': depth2,
'difference': diff
}
TOBIAS Output Files
| File | Description |
|---|---|
| *_corrected.bw | Bias-corrected signal |
| *_footprints.bw | Footprint scores |
| *_bound.bed | Predicted bound sites |
| *_unbound.bed | Predicted unbound sites |
| *_overview.txt | Per-TF statistics |
Related Skills
- atac-seq/atac-peak-calling - Generate peaks
- atac-seq/atac-qc - Verify data quality
- chip-seq/peak-annotation - Annotate binding sites
- sequence-manipulation/motif-search - Find motifs
Weekly Installs
3
Repository
gptomics/bioskillsInstalled on
windsurf2
trae2
opencode2
codex2
claude-code2
antigravity2