How To Analyze ChIP-seq Data For Absolute Beginners Part 2: Visualizing ChIP-seq Data

Video Tutorial

Introduction: Why Visualization Is Critical In ChIP-seq Analysis

After processing raw ChIP-seq data and identifying protein-DNA binding sites in Part 1, visualization becomes the crucial next step that transforms abstract numerical data into interpretable biological insights. While peak calling identifies where proteins bind to DNA, visualization helps answer questions about how and why these interactions occur within their genomic context.

Good visualization doesn’t just make your data look pretty—it reveals patterns, validates findings, generates hypotheses, and communicates results effectively. In this comprehensive guide, we’ll explore the essential tools and techniques for visualizing ChIP-seq data, from genome browsers to command-line solutions.

What You’ll Learn In This Tutorial

How to prepare ChIP-seq files for visualization
How to use genome browsers (IGV and UCSC) for interactive data exploration
How to create publication-quality heatmaps and profile plots with deepTools
Best practices for effective ChIP-seq data visualization
Troubleshooting common visualization issues

Beginner’s Tip: Visualization isn’t just the final step—it should be integrated throughout your analysis workflow to validate results and guide your next steps.

What Can Be Visualized in ChIP-seq Analysis?

ChIP-seq data visualization typically includes:

Read Coverage Tracks: Displays the density of sequencing reads across the genome, showing binding intensity
Peak Locations: Highlights statistically significant binding regions identified by peak callers
Signal Profiles: Shows binding patterns around specific genomic features (e.g., transcription start sites)
Heatmaps: Compares binding patterns across multiple samples or genomic regions
Aggregate Plots: Summarizes average binding profiles across many regions
Motif Enrichment: Visualizes DNA sequence motifs found within binding sites
Genomic Context: Integrates binding sites with gene annotations, chromatin states, and other genomic features

Why Visualization Matters in ChIP-seq Analysis

Visualization serves several critical functions in ChIP-seq analysis:

Quality Assessment: Reveals technical artifacts and experimental issues that numerical metrics might miss
Pattern Discovery: Identifies binding patterns that might not be apparent from statistical analysis alone
Hypothesis Generation: Suggests relationships between protein binding and gene regulation
Result Validation: Confirms that peaks align with visual evidence of enrichment
Data Integration: Connects protein binding with other genomic and epigenomic features
Communication: Presents complex genomic data in an accessible format for collaboration and publication

Preparing Files for Visualization

Before diving into visualization tools, you need to generate the appropriate file formats from your processed ChIP-seq data:

Essential File Formats for ChIP-seq Visualization

Peak files (.bed, .narrowPeak, .broadPeak):

Contain chromosome, start, end, and often additional statistics
Used for displaying discrete binding sites
Typically small files that can be easily shared

Signal files (.bigWig, .bedGraph):

Represent continuous data across the genome
Show binding strength at each position
BigWig format is compressed and indexed for efficient browser loading

Let’s prepare these files using the example data from Part 1:

#-----------------------------------------------
# STEP 1: Prepare visualization files
#-----------------------------------------------

# Activate the HOMER environment
conda activate ~/Env_Homer

#=============================================
# 1.1: Convert peak files to BED format
#=============================================
echo "Converting HOMER peaks to BED format..."

# First, get chromosome sizes for the reference genome
fetchChromSizes hg38 > ~/GSE104247/homer/hg38.chrom.sizes

# Convert HOMER peak file to BED format
pos2bed.pl ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks.tsv > ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks.bed

# Convert BED file to BigBed format
LC_ALL=C sort -k1,1 -k2,2n \
~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks.bed \
-o ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks_sorted.bed

bedToBigBed ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks_sorted.bed ~/GSE104247/homer/hg38.chrom.sizes ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks_sorted.bb

#=============================================
# 1.2: Create signal tracks (bigWig files)
#=============================================
echo "Creating signal tracks for visualization..."

# Method 1: Make bigWig files using HOMER (normalized against Input)
makeUCSCfile \
    ~/GSE104247/homer/SRR6117703_USF2 \     # Tag directory for ChIP sample
    -o auto \                               # Auto-generate output name
    -i ~/GSE104247/homer/SRR6117732_USF2_Input \  # Input control
    -bigWig ~/GSE104247/homer/hg38.chrom.sizes \  # Chromosome sizes
    -style chipseq                          # ChIP-seq style normalization

# Method 2: Make bigWig files using deepTools

# Create bigWig for ChIP sample (USF2)
bamCoverage \
    --bam ~/GSE104247/bam/SRR6117703_USF2_sorted_dedup_filtered.bam \  # Input BAM
    --outFileName ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2.deeptools.bw \  # Output
    --binSize 10 \                          # Resolution (10bp bins)
    --normalizeUsing RPKM \                 # Normalize for sequencing depth
    --effectiveGenomeSize 2913022398 \      # Effective size of hg38
    --numberOfProcessors 8                  # Use 8 CPU cores

# Create bigWig for Input control
bamCoverage \
    --bam ~/GSE104247/bam/SRR6117732_USF2_Input_sorted_dedup_filtered.bam \
    --outFileName ~/GSE104247/homer/SRR6117703_USF2/SRR6117732_USF2_Input.deeptools.bw \
    --binSize 10 \
    --normalizeUsing RPKM \
    --effectiveGenomeSize 2913022398 \
    --numberOfProcessors 8

echo "Visualization files prepared successfully!"

Technical Note: We provide two methods for creating signal tracks. HOMER’s approach automatically performs ChIP vs. Input normalization, while deepTools offers more customization options. Both approaches have merits depending on your specific visualization goals.

Key Tools for ChIP-seq Visualization

Three major tools dominate the ChIP-seq visualization landscape, each with distinct strengths:

1. Integrative Genomics Viewer (IGV)

IGV is a high-performance desktop application developed by the Broad Institute that excels at interactive exploration of genomic data.

Key features:

Responsive interface with smooth navigation
Local installation with no data upload requirements
Support for numerous file formats (BAM, BED, BigWig, etc.)
Customizable track display settings
Built-in analysis tools for basic operations

Best for: Detailed examination of binding patterns at specific loci, quick visualization of local datasets, and preliminary data exploration.

2. UCSC Genome Browser

The UCSC Genome Browser is a powerful web-based platform that emphasizes integration with public genomic datasets and annotations.

Key features:

Extensive collection of pre-loaded annotation tracks
Web-based access from any computer
Advanced display customization options
Session saving and sharing capabilities
Integration with hundreds of public datasets

Best for: Contextualizing ChIP-seq data within the broader genomic landscape, sharing results with collaborators, and accessing public datasets.

3. deepTools

deepTools is a suite of Python tools specifically designed for visualizing and analyzing deep-sequencing data at scale.

Key features:

Efficient handling of large datasets
Comprehensive normalization options
Flexible heatmap and profile plot generation
Built-in statistical analysis capabilities
Scriptable for reproducible workflows

Best for: Generating publication-quality heatmaps, creating aggregate plots across many regions, and performing sophisticated comparative analyses.

Visualizing ChIP-seq Data with Genome Browsers

Genome browsers display a reference genome sequence as a horizontal coordinate system, with different types of data aligned to these coordinates as “tracks.” Think of them as Google Maps for genomes – they let you navigate, zoom in and out of genomic regions, and overlay different types of genomic information.

Visualizing ChIP-seq Data in IGV

The Integrative Genomics Viewer (IGV) is a desktop application that provides a responsive, interactive environment for exploring genomic data.

Step 1: Install and Launch IGV

Download IGV from https://igv.org/doc/desktop/#DownloadPage/
Launch the application
Select the reference genome (hg38) from the dropdown menu at the top

Step 2: Load Your Data Files

Click on “File” → “Load from File…”
Navigate to your project directory and select the following files:

~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks.bed (peak file)
~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2.ucsc.bigWig (signal track from HOMER)
~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2.deeptools.bw (signal track from deepTools)
~/GSE104247/homer/SRR6117703_USF2/SRR6117732_USF2_Input.deeptools.bw (control track)

Step 3: Navigate to Regions of Interest

Enter gene symbols or genomic coordinates in the search box (e.g., “CCND1” or “chr8:128,747,680-128,753,674”)
Use the “+” and “-” buttons to zoom in and out
Right-click on tracks to access display options (color, track height, etc.)

Step 4: Interpret the Visualization

In IGV, you should see:

The BigWig tracks displayed as continuous signal plots
The BED track showing peaks as colored bars at the bottom
Gene annotations displayed at the top (if you loaded them)

Pay attention to:

Signal enrichment in your ChIP sample relative to the Input control
Correlation between called peaks and visible signal enrichment
Proximity of binding sites to genes or other genomic features

Beginner’s Tip: Right-click on a track and select “Autoscale” to automatically adjust the y-axis to fit the data in the current view. This helps visualize signal differences.

Visualizing ChIP-seq Data in UCSC Genome Browser

The UCSC Genome Browser provides a web-based platform with extensive annotation resources and sharing capabilities.

Step 1: Host Your Files Online

The UCSC Genome Browser doesn’t accept file uploads directly; files must be hosted online. One solution is to use Cyverse:

Register for a free account at Cyverse
Upload your BigWig and BED files to your Cyverse storage
Make the files publicly accessible and copy their URLs

Technical Note: UCSC Genome Browser doesn’t accept file links from popular cloud storage services such as Google Drive, OneDrive, Dropbox, or Box. Cyverse provides 5GB free storage that works with UCSC.

Step 2: Add Your Tracks to UCSC

Go to UCSC Genome Browser
Select “Human GRCh38/hg38” assembly
Click on “MyData” → “Custom Tracks”
Enter track definitions in the text box:

# Add Sample track
track type=bigWig name="USF2 ChIP-seq" bigDataUrl=YOUR_PUBLIC_LINK_ON_CYVERSE/SRR6117703_USF2.deeptools.bw visibility=full color=0,0,255 viewLimits=0:85 autoScale=off maxHeightPixels=100:50:11

# Add Control track
track type=bigWig name="USF2 Input" bigDataUrl=YOUR_PUBLIC_LINK_ON_CYVERSE/SRR6117703_USF2_Input.deeptools.bw visibility=full color=128,128,128 viewLimits=0:85 autoScale=off maxHeightPixels=100:50:11

# Add Peaks
track type=bigBed name="USF2 Peaks" parent=USF2_Experiment description="USF2 Binding Sites" bigDataUrl=YOUR_PUBLIC_LINK_ON_CYVERSE/SRR6117703_USF2_peaks_sorted.bb visibility=pack color=255,0,0

Click “Submit” to load your tracks

Step 3: Navigate and Customize

Search for genes or coordinates of interest
Use the navigation controls to adjust the view
Configure track display settings by clicking on the track names
Add public annotation tracks by clicking “add tracks”

Step 4: Save and Share Sessions

Create an account on UCSC Genome Browser
Click “My Data” → “My Sessions” → “Save Settings”
You can share the session URL with collaborators to show exactly the same view

Beginner’s Tip: UCSC Genome Browser has thousands of annotation tracks available. Add tracks like “ENCODE Regulation” or “Conservation” to see how your binding sites correlate with known functional elements.

Advanced Visualization with deepTools

While genome browsers excel at inspecting individual loci, deepTools enables systematic visualization across thousands of genomic regions simultaneously, revealing global patterns.

Creating Heatmaps with deepTools

Heatmaps show ChIP-seq signal distribution across multiple genomic regions, with color intensity representing binding strength.

#-----------------------------------------------
# STEP 2: Create ChIP-seq heatmaps with deepTools
#-----------------------------------------------

#=============================================
# 2.1: Prepare matrix for heatmap
#=============================================
echo "Preparing matrix for heatmap visualization..."

# computeMatrix calculates scores (signal values) from bigWig files
# for specified genomic regions, creating a matrix for visualization
computeMatrix reference-point \
    --referencePoint center \                # Center binding sites
    -S ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2.deeptools.bw \  # Signal file
    -R ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks.bed \     # Peaks
    -o ~/GSE104247/homer/SRR6117703_USF2/USF2_matrix_heatmap.gz \        # Output matrix
    -b 1500 \                               # 1500bp before reference point
    -a 1500 \                               # 1500bp after reference point
    --binSize 10 \                          # 10bp bins for resolution
    --numberOfProcessors 8                  # Use 8 CPU cores

#=============================================
# 2.2: Generate heatmap visualization
#=============================================
echo "Generating heatmap visualization..."

# plotHeatmap creates a heatmap from the matrix data
plotHeatmap \
    -m ~/GSE104247/homer/SRR6117703_USF2/USF2_matrix_heatmap.gz \  # Input matrix
    -out ~/GSE104247/homer/SRR6117703_USF2/USF2_heatmap.png \      # Output file
    --sortRegions descend \                 # Sort regions by signal strength
    --colorMap RdYlBu \                     # color scheme
    --whatToShow 'heatmap and colorbar' \   # Show heatmap and color scale
    --dpi 300 \                             # High resolution
    --regionsLabel "USF2 Binding Sites" \   # Label for regions
    --samplesLabel "USF2" \                 # Label for samples
    --plotTitle "USF2 Binding Pattern"      # Title for plot

echo "Heatmap created successfully!"

The resulting heatmap shows:

Each row represents a binding site (peak)
The x-axis shows distance from the peak center
Color intensity indicates binding strength
Sorting by signal reveals distinct binding patterns

Creating Profile Plots with deepTools

Profile plots show the average binding signal across all regions, providing a quantitative summary of binding patterns.

#-----------------------------------------------
# STEP 3: Create ChIP-seq profile plots with deepTools
#-----------------------------------------------

#=============================================
# 3.1: Prepare matrix for profile plot
#=============================================
echo "Preparing matrix for profile plot visualization..."

# Create a matrix with both ChIP and Input samples for comparison
computeMatrix reference-point \
    --referencePoint center \               # Center on peak midpoints
    -S ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2.deeptools.bw \     # ChIP signal
       ~/GSE104247/homer/SRR6117703_USF2/SRR6117732_USF2_Input.deeptools.bw \  # Input signal
    -R ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2_peaks.bed \        # Regions
    -o ~/GSE104247/homer/SRR6117703_USF2/USF2_matrix_profile.gz \           # Output file
    -b 1500 \                               # 1500bp before reference point
    -a 1500 \                               # 1500bp after reference point
    --binSize 10 \                          # 10bp bin size
    --numberOfProcessors 8                  # Use 8 CPU cores

#=============================================
# 3.2: Generate profile plot visualization
#=============================================
echo "Generating profile plot visualization..."

# plotProfile creates line plots showing average signal distribution
plotProfile \
    -m ~/GSE104247/homer/SRR6117703_USF2/USF2_matrix_profile.gz \  # Input matrix
    -out ~/GSE104247/homer/SRR6117703_USF2/USF2_profile.png \      # Output file
    --perGroup \                            # Plot each group separately
    --colors blue green \                   # Blue for ChIP, green for Input
    --plotTitle "USF2 Binding Profile" \    # Plot title
    --samplesLabel "USF2" "Input" \         # Sample labels
    --regionsLabel "Binding Sites" \        # Regions label
    --dpi 300 \                             # High resolution
    --plotType lines \                      # Plot style
    --yAxisLabel "Average Signal" \         # Y-axis label
    --xAxisLabel "Distance from Peak Center (bp)" \  # X-axis label
    --legendLocation "upper right"          # Legend position

echo "Profile plot created successfully!"

The resulting profile plot shows:

The x-axis represents distance from the peak center
The y-axis shows average signal intensity
The blue line shows USF2 ChIP-seq signal
The green line shows Input control signal
The sharp peak in ChIP vs. flat Input confirms specific binding

Creating Enrichment Comparison Heatmaps

To compare binding patterns across different genomic features, we can create feature-specific heatmaps:

#-----------------------------------------------
# STEP 4: Create ChIP-seq feature-specific heatmaps
#-----------------------------------------------

#=============================================
# 4.1: Download gene annotations
#=============================================
echo "Downloading gene annotations..."

# Create directory for annotations
mkdir -p ~/GSE104247/annotations

# Download gene TSS annotations from UCSC
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz -O ~/GSE104247/annotations/refGene.txt.gz

# Extract TSS positions (2kb window around TSS)
zcat ~/GSE104247/annotations/refGene.txt.gz | \
    awk 'BEGIN{OFS="\t"} {if($4=="+") {print $3, $5-1000, $5+1000, $13, ".", $4} \
         else {print $3, $6-1000, $6+1000, $13, ".", $4}}' | \
    sort -k1,1 -k2,2n | uniq > ~/GSE104247/annotations/TSS_2kb.bed

#=============================================
# 4.2: Create matrix for TSS heatmap
#=============================================
echo "Creating TSS enrichment matrix..."

computeMatrix reference-point \
    --referencePoint center \               # Center on TSS
    -S ~/GSE104247/homer/SRR6117703_USF2/SRR6117703_USF2.deeptools.bw \  # Signal file
    -R ~/GSE104247/annotations/TSS_2kb.bed \                            # TSS regions
    -o ~/GSE104247/homer/SRR6117703_USF2/USF2_TSS_matrix.gz \           # Output matrix
    -b 2000 \                               # 2000bp before TSS
    -a 2000 \                               # 2000bp after TSS
    --binSize 10 \                          # 10bp bins
    --numberOfProcessors 8 \                # 8 CPU cores
    --skipZeros                             # Skip regions with no signal

#=============================================
# 4.3: Create TSS heatmap visualization
#=============================================
echo "Creating TSS enrichment heatmap..."

plotHeatmap \
    -m ~/GSE104247/homer/SRR6117703_USF2/USF2_TSS_matrix.gz \  # Input matrix
    -out ~/GSE104247/homer/SRR6117703_USF2/USF2_TSS_heatmap.png \  # Output file
    --colorMap Blues \                      # Blue color scheme
    --whatToShow 'heatmap and colorbar' \   # Show heatmap and colorbar
    --sortRegions descend \                 # Sort by decreasing signal
    --plotTitle "USF2 Binding at Transcription Start Sites" \  # Title
    --regionsLabel "TSS" \                  # Regions label
    --samplesLabel "USF2" \                 # Sample label
    --zMin 0 \                              # Minimum value for color scale
    --dpi 300                               # High resolution

echo "TSS enrichment analysis complete!"

Beginner’s Tip: Try creating similar heatmaps for other genomic features like enhancers, CTCF sites, or CpG islands to understand the binding preference of your protein of interest.

Best Practices for Effective ChIP-seq Visualization

Choosing the Right Visualization Method

For exploring specific loci: Use genome browsers (IGV or UCSC)
For global binding patterns: Use deepTools heatmaps and profiles
For comparative analysis: Use multiple tracks in browsers or multi-sample heatmaps
For publication figures: Combine browser screenshots with deepTools plots

Color Selection and Scaling

Use intuitive colors: Red/blue for positive/negative values, grayscale for controls
Maintain consistent colors: Use the same color scheme across related figures
Consider color blindness: Avoid red-green combinations
Choose appropriate scaling:

Linear scale for most signals
Log scale for datasets with extreme value ranges
Set consistent scales when comparing samples

Incorporating Genomic Context

Include gene annotations in genome browser views
Highlight functional elements (promoters, enhancers, etc.)
Add complementary data tracks (e.g., histone modifications, DNase sensitivity)
Include genome coordinates for reference

Figure Organization for Publications

Combine overview and detail: Show genome-wide patterns and specific examples
Create multi-panel figures: Browser views + heatmaps + profiles
Include controls: Always show input controls for comparison
Provide clear legends: Explain color scales, track identities, etc.

Troubleshooting Common Visualization Issues

Missing or Incomplete Signal Tracks

Problem: BigWig files appear empty or have gaps.

Solutions:

Check chromosome naming consistency (e.g., “chr1” vs “1”)
Verify that BAM files are properly indexed
Ensure proper normalization parameters
Check for regions excluded during alignment (e.g., blacklisted regions)

Poor Signal-to-Noise Ratio

Problem: Difficult to distinguish true signal from background.

Solutions:

Adjust track display settings (y-axis scale)
Try different normalization methods
Compare with Input control
Focus on high-confidence peaks

Browser Performance Issues

Problem: Slow loading or browser crashes.

Solutions:

Use indexed and compressed formats (BigWig instead of bedGraph)
Reduce the number of tracks displayed simultaneously
Limit the genome region being viewed
Use a local browser (IGV) for very large datasets

Inconsistent Scaling Between Samples

Problem: Direct visual comparison is difficult due to different scales.

Solutions:

Use “Group Autoscale” in IGV to apply consistent scales for multiple samples
Use deepTools with fixed min/max values (–zMin and –zMax)
Create normalized ratio tracks (ChIP/Input)

Conclusion: From Visualization to Biological Insight

Effective visualization is the bridge between computational analysis and biological understanding in ChIP-seq experiments. By combining the strengths of genome browsers for detailed inspection with the pattern discovery capabilities of deepTools, you can extract meaningful insights from your data.

Remember that visualization is not just the final step in your analysis—it should be integrated throughout your workflow to validate results, guide your analysis decisions, and generate new hypotheses.

In the next tutorial of this series, we’ll explore how to perform differential binding analysis to compare ChIP-seq experiments across different conditions, further expanding your ChIP-seq analysis toolkit.

Resources and Further Reading

Software Documentation

ChIP-seq Visualization Tutorials

This tutorial is part of the NGS101.com beginner’s guide to next-generation sequencing analysis. If you have questions or suggestions, please leave a comment below.

How To Analyze ChIP-seq Data For Absolute Beginners Part 2: Visualizing ChIP-seq Data

Video Tutorial

Introduction: Why Visualization Is Critical In ChIP-seq Analysis

What You’ll Learn In This Tutorial

What Can Be Visualized in ChIP-seq Analysis?

Why Visualization Matters in ChIP-seq Analysis

Preparing Files for Visualization

Essential File Formats for ChIP-seq Visualization

Key Tools for ChIP-seq Visualization

1. Integrative Genomics Viewer (IGV)

2. UCSC Genome Browser

3. deepTools

Visualizing ChIP-seq Data with Genome Browsers

Visualizing ChIP-seq Data in IGV

Step 1: Install and Launch IGV

Step 2: Load Your Data Files

Step 3: Navigate to Regions of Interest

Step 4: Interpret the Visualization

Visualizing ChIP-seq Data in UCSC Genome Browser

Step 1: Host Your Files Online

Step 2: Add Your Tracks to UCSC

Step 3: Navigate and Customize

Step 4: Save and Share Sessions

Advanced Visualization with deepTools

Creating Heatmaps with deepTools

Creating Profile Plots with deepTools

Creating Enrichment Comparison Heatmaps

Best Practices for Effective ChIP-seq Visualization

Choosing the Right Visualization Method

Color Selection and Scaling

Incorporating Genomic Context

Figure Organization for Publications

Troubleshooting Common Visualization Issues

Missing or Incomplete Signal Tracks

Poor Signal-to-Noise Ratio

Browser Performance Issues

Inconsistent Scaling Between Samples

Conclusion: From Visualization to Biological Insight

Resources and Further Reading

Software Documentation

ChIP-seq Visualization Tutorials

Share this:

Like this:

Comments

Leave a Reply Cancel reply

Search

Subscribe

Categories

Recent Posts

Tags