How To Analyze Whole Genome Sequencing Data For Absolute Beginners Part 4: Visualizing and Interpreting Somatic Mutations

Table of Contents

Introduction: From Multiple VCF Files to Biological Insights

This tutorial builds upon our previous whole genome sequencing analysis pipeline, specifically the mutation calling results from Part 2A: Matched Tumor-Normal Mutation Calling with Mutect2. You should now have multiple high-confidence VCF files from different tumor-normal pairs that need to be converted to MAF (Mutation Annotation Format) and visualized for biological interpretation.

Understanding Key Concepts in Cancer Mutation Analysis

Before diving into the technical implementation, it’s essential to understand the core analytical concepts we’ll be working with:

Oncoplots (Mutation Landscapes): These are matrix-style heatmaps that display mutation status across multiple samples and genes simultaneously. Each row represents a gene, each column represents a sample, and colored cells indicate the presence and type of mutations. Oncoplots are the gold standard for visualizing mutation patterns in cancer genomics because they allow researchers to quickly identify frequently mutated genes, sample-specific mutation profiles, and potential mutation co-occurrence or mutual exclusivity patterns. They’re particularly valuable for identifying driver genes and understanding tumor heterogeneity across patient cohorts.

Mutation Burden Analysis: This refers to the quantitative assessment of the total number of mutations per sample, often normalized by genome size (mutations per megabase). Mutation burden is a critical biomarker in cancer research because it can predict treatment response, particularly to immunotherapy. Tumors with high mutation burden often have more neoantigens that can be recognized by the immune system, making them more susceptible to checkpoint inhibitor therapies. Additionally, mutation burden can indicate underlying DNA repair defects or exposure to mutagens.

Transition/Transversion (Ti/Tv) Analysis: This quality control assessment examines the types of single nucleotide substitutions in mutation data by categorizing them as transitions (mutations between chemically similar bases: A↔G or C↔T) or transversions (mutations between chemically different base types: purines to pyrimidines or vice versa). The Ti/Tv ratio serves as both a data quality indicator and a biological signature – normal human samples typically show ratios of 2.0-3.0, with deviations suggesting either technical artifacts (ratios <1.5 or >4.0) or specific mutational processes like APOBEC enzyme activity, UV exposure, or alkylating agent damage. In cancer genomics, Ti/Tv analysis helps validate mutation calls, identify underlying carcinogenic exposures, and characterize the mutational landscape that shaped each tumor. This analysis is particularly valuable because different cancer types and treatment exposures create characteristic Ti/Tv patterns, making it both a quality control checkpoint and a window into the biological processes driving mutagenesis.

Mutation Signatures: These are characteristic patterns of mutations that reflect the underlying mutational processes that have been active in cancer cells. Each signature represents a specific combination of mutation types in their trinucleotide context (the mutated base plus its immediate neighbors). For example, UV exposure creates a distinctive pattern of C>T mutations at dipyrimidine sites, while defective mismatch repair creates signatures characterized by small insertions and deletions at microsatellites. Identifying mutation signatures can reveal the etiology of cancer, predict treatment responses, and guide therapeutic strategies.

Mutation Hotspots: These are specific genomic positions or protein domains that are recurrently mutated across multiple cancer samples. Hotspots often indicate functionally important sites where mutations provide a growth advantage to cancer cells. Identifying hotspots helps distinguish between driver mutations (those that contribute to cancer development) and passenger mutations (those that occur randomly). Therapeutic targeting of mutation hotspots has led to successful precision medicine approaches, such as targeting EGFR mutations in lung cancer or BRAF mutations in melanoma.

Clinical and Research Significance

These analytical approaches provide crucial insights for both research and clinical applications:

Treatment Selection: Mutation burden and specific mutation signatures can guide immunotherapy decisions
Drug Development: Hotspot analysis identifies potential therapeutic targets for precision medicine
Prognosis: Mutation patterns can predict patient outcomes and disease progression
Resistance Mechanisms: Temporal analysis of mutations can reveal how tumors evolve and develop resistance
Population Studies: Large-scale mutation analysis reveals cancer subtypes and population-specific variants

What You’ll Learn

By the end of this tutorial, you’ll be able to:

Convert multiple VCF files to a single MAF file using vcf2maf with proper annotation
Create publication-ready oncoplots showing mutation landscapes across samples
Analyze mutation burden and patterns with statistical summaries
Identify mutation signatures that reveal underlying mutational processes
Discover mutation hotspots and recurrent variants across your samples

Prerequisites

Completed mutation calling tutorial with multiple tumor-normal pairs
Basic familiarity with R and command line
High-confidence VCF files from Mutect2 analysis
Access to the conda environment from the Part 3: Annotating SNVs and Mutations with Multiple Tools

Converting Multiple VCF Files to MAF Format

Before we can visualize our mutations in R, we need to convert the VCF files to the MAF format that maftools requires.

Setting Up vcf2maf

First, let’s activate our conda environment from the annotation tutorial and set up the vcf2maf tool:

# Activate the conda environment from the WGS tutorial part 3
conda activate wgs_analysis

# Create and navigate to the visualization directory
mkdir -p ~/wgs/mutation_analysis
cd ~/wgs/mutation_analysis

# Clone and set up vcf2maf
git clone https://github.com/mskcc/vcf2maf.git
cd vcf2maf
chmod +x vcf2maf.pl maf2vcf.pl
cd ../

Converting VCF to MAF

Now let’s convert each high-confidence VCF file to MAF format. This step adds comprehensive functional annotations using VEP (Variant Effect Predictor):

# Convert tumor1_vs_normal1 VCF to MAF
~/wgs/mutation_analysis/vcf2maf/vcf2maf.pl \
  --input-vcf tumor1_vs_normal1_high_confidence.vcf \
  --output-maf tumor1_vs_normal1_high_confidence.maf \
  --ncbi-build GRCh38 \
  --vep-path ~/wgs_analysis/bin/ \
  --vep-data ~/wgs_tutorial/annotation/vep_cache \
  --ref-fasta ~/wgs_tutorial/reference/Homo_sapiens_assembly38.fasta

Repeat this process for each additional sample, changing the input and output filenames accordingly:

tumor2_vs_normal2_high_confidence.vcf → tumor2_vs_normal2_high_confidence.maf
tumor3_vs_normal3_high_confidence.vcf → tumor3_vs_normal3_high_confidence.maf
And so on for all your samples…

Merging MAF Files

Once all individual MAF files are created, we’ll merge them using R and the maftools package (see the following section for package installation), which provides a more robust approach than command-line concatenation:

# Start R and load maftools
library(maftools)

# Create a list of MAF files to merge
maf_files <- c(
    "tumor1_vs_normal1_high_confidence.maf",
    "tumor2_vs_normal2_high_confidence.maf",
    "tumor3_vs_normal3_high_confidence.maf"
    # Add additional MAF files as needed
)

# Merge all MAF files into a single MAF object
# The merge_mafs function properly handles different MAF formats and validates data integrity
merged_maf <- merge_mafs(
    mafs = maf_files,                    # List of MAF files to merge
    verbose = TRUE                       # Show progress and summary information
)

# Save the merged MAF object as a file for future use
write.table(
    merged_maf@data,                     # Extract the data component
    file = "merged_mutations.maf",       # Output filename
    sep = "\t",                          # Tab-separated format
    quote = FALSE,                       # Don't quote strings
    row.names = FALSE                    # Don't include row names
)

The merge_mafs function offers several advantages over simple file concatenation:

Data validation: Ensures consistent column formats across files
Duplicate handling: Identifies and manages potential duplicate entries
Error checking: Validates MAF format compliance
Summary statistics: Provides information about the merging process

Now you have a properly merged MAF file ready for comprehensive analysis.

Setting Up the R Environment

Installing Essential R Packages

Let’s install the packages we need for this visualization and analysis tutorial:

#-----------------------------------------------
# Install essential R packages for mutation visualization
#-----------------------------------------------

# Set up CRAN mirror for package installation
options(repos = c(CRAN = "https://cloud.r-project.org/"))

# Install Bioconductor if not already installed
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# List of essential packages for this tutorial
essential_packages <- c(
    # Data manipulation and visualization
    "tidyverse",           # For data wrangling and ggplot2
    "data.table",          # Fast data manipulation

    # Mutation analysis packages
    "maftools",            # Main package for mutation visualization and analysis
    "MutationalPatterns",  # For mutation signature analysis
    "BSgenome.Hsapiens.UCSC.hg38",  # Human genome reference

    # Visualization enhancements
    "RColorBrewer",        # Professional color palettes
    "pheatmap"             # Pretty heatmaps for additional visualizations
)

# Install packages
for (pkg in essential_packages) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
        if (pkg %in% c("tidyverse", "data.table", "RColorBrewer", "pheatmap")) {
            install.packages(pkg)
        } else {
            BiocManager::install(pkg, update = FALSE)
        }
    }
}

# Load essential packages
library(tidyverse)
library(maftools)
library(RColorBrewer)

# Set up plotting theme for consistent visualization
theme_set(theme_minimal())

Loading the MAF Data

For this tutorial, we’ll use the example dataset from maftools to demonstrate the visualization techniques. You can then apply these same methods to your merged MAF file:

#-----------------------------------------------
# Load mutation data for analysis
#-----------------------------------------------

# Load the example TCGA LAML (Acute Myeloid Leukemia) dataset from maftools
# This dataset contains mutations from 200 samples and serves as an excellent example
laml.maf <- system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools')

# Read the MAF file into a maftools object
# This creates a structured object optimized for mutation analysis
maf <- read.maf(maf = laml.maf, verbose = TRUE)

# To use your own merged MAF file instead, uncomment the following line:
# maf <- read.maf(maf = "merged_mutations.maf", verbose = TRUE)

# The MAF object contains:
# - Sample summary: mutation counts per sample
# - Gene summary: mutation frequencies per gene  
# - Variant classifications: types of mutations found
# - Clinical data: if available in the MAF file

Creating Publication-Ready Oncoplots

Oncoplots are the signature visualization in cancer genomics, displaying mutation patterns across samples and genes in a matrix format.

Basic Oncoplot

The most fundamental visualization shows mutation patterns across your top mutated genes. This creates a matrix where rows are genes, columns are samples, and colors indicate mutation types.

#-----------------------------------------------
# Create basic oncoplot showing top mutated genes
#-----------------------------------------------

# Create a basic oncoplot showing the top 20 most frequently mutated genes
# Parameters:
# - maf: the MAF object containing our mutation data
# - top: number of top mutated genes to display
# - removeNonMutated: exclude samples with no mutations in displayed genes
oncoplot(
    maf = maf, 
    top = 20,                    # Show top 20 mutated genes
    removeNonMutated = FALSE     # Keep all samples for comparison
)

# Save the plot as a high-quality PDF for publication
pdf("basic_oncoplot.pdf", width = 12, height = 8)
oncoplot(maf = maf, top = 20, removeNonMutated = FALSE)
dev.off()

Advanced Oncoplot with Custom Features

Now we’ll create a more sophisticated version with custom colors, better formatting, and clearer annotations for publication-quality output.

#-----------------------------------------------
# Create advanced oncoplot with customizations
#-----------------------------------------------

# Define custom colors for different mutation types
# This helps distinguish between different types of genetic alterations
mutation_colors <- c(
    'Frame_Shift_Del' = '#A41E22',     # Red for deletions
    'Frame_Shift_Ins' = '#A41E22',     # Red for insertions  
    'Missense_Mutation' = '#1F78B4',   # Blue for missense mutations
    'Nonsense_Mutation' = '#33A02C',   # Green for nonsense mutations
    'Splice_Site' = '#6A3D9A',         # Purple for splice site mutations
    'In_Frame_Del' = '#FB9A99',        # Light red for in-frame deletions
    'In_Frame_Ins' = '#FB9A99',        # Light red for in-frame insertions
    'Silent' = '#BEBEBE'               # Gray for silent mutations
)

# Create an advanced oncoplot with customizations
oncoplot(
    maf = maf,
    top = 15,                          # Focus on top 15 genes for clarity
    colors = mutation_colors,          # Apply custom color scheme
    showTumorSampleBarcodes = TRUE,    # Display sample names
    fontSize = 0.8,                    # Adjust text size for readability
    titleText = "Mutation Landscape - Top 15 Genes",  # Add descriptive title
    legendFontSize = 0.8,              # Adjust legend text size
    annotationFontSize = 0.8           # Adjust annotation text size
)

# Save as high-resolution PDF for publication
pdf("advanced_oncoplot.pdf", width = 14, height = 10)
oncoplot(
    maf = maf,
    top = 15,
    colors = mutation_colors,
    showTumorSampleBarcodes = TRUE,
    fontSize = 0.8,
    titleText = "Mutation Landscape - Top 15 Genes",
    legendFontSize = 0.8,
    annotationFontSize = 0.8
)
dev.off()

Gene-Specific Lollipop Plots

For detailed analysis of individual genes, lollipop plots show exactly where mutations occur within protein sequences, revealing potential functional domains and hotspots.

#-----------------------------------------------
# Create lollipop plots for specific genes of interest
#-----------------------------------------------

# Get the most frequently mutated genes from our dataset
gene_summary <- getGeneSummary(maf)
top_genes <- gene_summary$Hugo_Symbol[1:5]  # Get top 5 genes

# Create lollipop plots for each top mutated gene
# Lollipop plots show the distribution of mutations along the protein sequence
pdf("lollipop_plots.pdf", width = 12, height = 8)

for (gene in top_genes) {
    # Create lollipop plot for each gene
    # This shows where mutations occur within the protein structure
    lollipopPlot(
        maf = maf,
        gene = gene,                    # Gene to visualize
        showMutationRate = TRUE,        # Display mutation frequency
        labelPos = "all"                # Label all mutation positions
    )
}

dev.off()

Mutation Summary and Statistics

Understanding the overall mutation landscape requires comprehensive statistical analysis of mutation patterns and burden.

Sample-Level Mutation Analysis

This analysis calculates key statistics for each sample, including total mutation burden, mutation rates, and the distribution of different mutation types across your cohort.

#-----------------------------------------------
# Analyze mutation burden and patterns across samples
#-----------------------------------------------

# Get detailed sample summary statistics
sample_summary <- getSampleSummary(maf)

# Calculate additional mutation statistics
mutation_stats <- sample_summary %>%
    mutate(
        # Calculate mutation rate per megabase (assuming ~30 Mb exome)
        mutation_rate_per_mb = total / 30,

        # Calculate percentage of each mutation type
        pct_missense = Missense_Mutation / total * 100,
        pct_nonsense = Nonsense_Mutation / total * 100,
        pct_frameshift = (Frame_Shift_Del + Frame_Shift_Ins) / total * 100,
        pct_splice = Splice_Site / total * 100
    )

# Display basic statistics
head(mutation_stats)

# Create mutation burden visualization
ggplot(mutation_stats, aes(x = reorder(Tumor_Sample_Barcode, -mutation_rate_per_mb), y = mutation_rate_per_mb)) +
    geom_bar(stat = "identity", fill = "#1F78B4", alpha = 0.7) +
    labs(
        title = "Mutation Burden Across Samples",
        x = "Sample ID", 
        y = "Mutations per Megabase",
        subtitle = paste("Mean:", round(mean(mutation_stats$mutation_rate_per_mb), 1), "mutations/Mb")
    ) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    geom_hline(yintercept = mean(mutation_stats$mutation_rate_per_mb), 
               linetype = "dashed", color = "red", alpha = 0.6)

# Save the mutation burden plot
ggsave("mutation_burden_analysis.pdf", width = 12, height = 6)

Gene-Level Mutation Analysis

Here we identify which genes are most frequently mutated across the cohort and calculate mutation frequencies as percentages of total samples.

#-----------------------------------------------
# Analyze mutation patterns at the gene level
#-----------------------------------------------

# Get gene-level summary statistics
gene_summary <- getGeneSummary(maf)

# Analyze the most frequently mutated genes
top_mutated_genes <- gene_summary %>%
    head(20) %>%
    mutate(
        # Calculate mutation frequency as percentage of samples
        mutation_frequency = MutatedSamples / getSampleSummary(maf) %>% nrow() * 100
    )

# Visualize top mutated genes
ggplot(top_mutated_genes, aes(x = reorder(Hugo_Symbol, MutatedSamples), 
                              y = MutatedSamples)) +
    geom_bar(stat = "identity", fill = "#33A02C", alpha = 0.7) +
    coord_flip() +
    labs(
        title = "Most Frequently Mutated Genes",
        x = "Gene Symbol",
        y = "Number of Mutated Samples",
        subtitle = "Top 20 genes ranked by mutation frequency"
    ) +
    theme_minimal()

# Save the gene frequency plot
ggsave("top_mutated_genes.pdf", width = 10, height = 8)

Transition/Transversion Analysis

This quality control analysis examines the chemical nature of base substitutions, providing both validation of data quality and insights into underlying mutational processes.

#-----------------------------------------------
# Analyze transition/transversion (Ti/Tv) ratios
#-----------------------------------------------

# Calculate Ti/Tv ratios for quality assessment
# Normal Ti/Tv ratios are typically 2-3 for most cancer types
titv_analysis <- titv(maf = maf, plot = FALSE, useSyn = TRUE)

# Save Ti/Tv analysis
pdf("titv_analysis.pdf", width = 10, height = 6)
plotTiTv(res = titv_analysis)
dev.off()

# Samples with very low (<1.5) or very high (>4) Ti/Tv ratios may indicate
# technical issues or specific mutational processes

Mutation Signature Analysis

Mutation signatures reveal the underlying mutational processes that shaped the tumor genome, such as DNA repair deficiencies, environmental exposures, or treatment effects.

Preparing Data for Signature Analysis

#-----------------------------------------------
# Extract and analyze mutation signatures
#-----------------------------------------------

# Load required packages for signature analysis
library(MutationalPatterns)
library(BSgenome.Hsapiens.UCSC.hg19)

# Extract trinucleotide context for signature analysis
# This creates a 96-dimensional mutational profile for each sample
trinuc_matrix <- trinucleotideMatrix(
    maf = maf, 
    prefix = 'chr',                    # Chromosome prefix format
    add = TRUE,                        # Add to existing MAF object
    ref_genome = "BSgenome.Hsapiens.UCSC.hg19"  # Reference genome
)

# The trinucleotide matrix shows the count of each possible 
# base substitution in its trinucleotide context (e.g., A[C>T]G)

Visualizing Mutation Signatures

With the trinucleotide matrix created, we can now analyze APOBEC signature enrichment, which reveals whether specific mutational processes have been active in these samples.

#-----------------------------------------------
# Create mutation signature visualizations
#-----------------------------------------------

# Plot the 96-trinucleotide profile for the first few samples
# This shows the characteristic "signature" of mutational processes
plotApobecDiff(
    tnm = trinuc_matrix,               # Trinucleotide matrix
    maf = maf,                         # MAF object
    pVal = 0.2                         # P-value threshold for significance
)

# Save signature analysis plot
pdf("mutation_signatures.pdf", width = 12, height = 8)
plotApobecDiff(tnm = trinuc_matrix, maf = maf, pVal = 0.2)
dev.off()

Mutation Hotspot Analysis

Identifying recurrent mutations and hotspots helps prioritize functionally important variants and potential therapeutic targets.

Analyzing Mutation Hotspots

This statistical analysis identifies genes that have significantly more mutations than expected by chance, helping distinguish potential driver genes from passenger mutations.

#-----------------------------------------------
# Detect and visualize mutation hotspots
#-----------------------------------------------

# Find genes with significantly more mutations than expected
# This helps identify potential driver genes vs passenger mutations
oncotable <- oncodrive(maf = maf, AACol = 'Protein_Change', minMut = 5)

# Display top oncogenic genes
head(oncotable)

# Create visualization of oncogenes
ggplot(head(oncotable, 15), aes(x = reorder(Hugo_Symbol, -zscore), y = zscore)) +
    geom_bar(stat = "identity", fill = "#E31A1C", alpha = 0.7) +
    geom_hline(yintercept = 2, linetype = "dashed", color = "blue") +
    coord_flip() +
    labs(
        title = "Potential Driver Genes",
        x = "Gene Symbol",
        y = "Z-score (mutation frequency)",
        subtitle = "Genes with significantly elevated mutation rates",
        caption = "Dashed line indicates significance threshold (z-score > 2)"
    ) +
    theme_minimal()

# Save oncogene analysis
ggsave("driver_genes_analysis.pdf", width = 10, height = 8)

Protein Domain Analysis

This analysis examines whether mutations cluster within specific functional protein domains, indicating regions under positive selection or structural constraints.

#-----------------------------------------------
# Analyze mutations within protein domains
#-----------------------------------------------

# Identify protein domains enriched for mutations and these domains
pdf("protein_domain_analysis.pdf", width = 12, height = 8)
pfamDomains(maf = maf, top = 10)
dev.off()

# Protein domains with high mutation density may indicate:
# 1. Functionally important regions under positive selection
# 2. Structural constraints that make mutations impactful
# 3. Therapeutic targets for precision medicine approaches

Sample Comparison and Clustering

This analysis identifies patterns of mutual exclusivity and co-occurrence among mutations, revealing potential pathway relationships and therapeutic targets.

#-----------------------------------------------
# Compare mutation patterns between samples
#-----------------------------------------------

# Create a heatmap showing mutation patterns across samples and genes
# This helps identify sample clusters and gene co-mutation patterns
somaticInteractions(
    maf = maf,                         # MAF object
    top = 25,                          # Top genes to include
    pvalue = c(0.05, 0.1)             # P-value thresholds for significance
)

# Save interaction analysis
pdf("somatic_interactions.pdf", width = 12, height = 10)
somaticInteractions(maf = maf, top = 25, pvalue = c(0.05, 0.1))
dev.off()

# Mutual exclusivity and co-occurrence patterns can reveal:
# 1. Pathways that are disrupted by alternative mechanisms
# 2. Gene pairs that work together in cancer development  
# 3. Potential synthetic lethal relationships for therapy

Best Practices for Mutation Visualization

Quality Control Considerations

Data Quality Checks:

Verify Ti/Tv ratios are within expected ranges (2-3 for most cancers)
Check mutation burden against cancer type expectations
Ensure adequate sample size for statistical analyses (minimum 10-20 samples)
Validate key findings with independent datasets or literature

Visualization Standards:

Use consistent color schemes across all plots
Include appropriate statistical annotations and p-values
Provide clear legends and axis labels
Save plots in both vector (PDF) and raster (PNG) formats

Interpretation Guidelines:

Focus on recurrent mutations across multiple samples
Prioritize mutations in known cancer genes and pathways
Consider functional impact predictions from annotation tools
Integrate with clinical data when available

Common Pitfalls to Avoid

Technical Issues:

Not adjusting for multiple testing in statistical analyses
Ignoring batch effects between sequencing runs
Using inappropriate reference genomes or annotation versions
Insufficient filtering of low-quality variants

Biological Interpretation:

Over-interpreting single-sample findings
Ignoring passenger mutations in highly mutated tumors
Not considering tumor heterogeneity and clonal evolution
Failing to validate computational predictions experimentally

Extending the Analysis

Advanced Visualizations:

Integrate copy number and structural variant data
Create interactive plots using plotly or shiny
Generate patient-specific mutation reports
Compare with public datasets (TCGA, COSMIC)

Clinical Applications:

Identify actionable mutations for targeted therapy
Assess mutation signatures for treatment selection
Monitor mutation evolution in longitudinal samples
Develop prognostic or predictive biomarkers

Conclusion

You now have a comprehensive toolkit for visualizing and interpreting somatic mutations from whole genome sequencing data. The combination of vcf2maf for data preparation and maftools for analysis provides a powerful framework for:

Professional Visualization: Publication-ready oncoplots and mutation landscapes
Statistical Analysis: Rigorous assessment of mutation patterns and significance
Biological Insight: Understanding mutational processes and driver genes
Clinical Translation: Identifying actionable findings for precision medicine

Next Steps

Validation: Confirm key mutations using orthogonal sequencing methods
Integration: Combine with RNA-seq, copy number, and clinical data
Functional Studies: Design experiments to test mutation impact
Clinical Application: Translate findings into diagnostic or therapeutic strategies

This tutorial provides the foundation for advanced mutation analysis. As you apply these methods to your own data, remember that the goal is not just to catalog mutations, but to understand their biological significance and clinical relevance.

References

Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C., & Koeffler, H. P. (2018). Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Research, 28(11), 1747-1756.
Alexandrov, L. B., Kim, J., Haradhvala, N. J., et al. (2020). The repertoire of mutational signatures in human cancer. Nature, 578(7793), 94-101.
Blokzijl, F., Janssen, R., van Boxtel, R., & Cuppen, E. (2018). MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Medicine, 10(1), 33.
Chakravarty, D., Gao, J., Phillips, S. M., et al. (2017). OncoKB: A Precision Oncology Knowledge Base. JCO Precision Oncology, 1, 1-16.
Tamborero, D., Gonzalez-Perez, A., & Lopez-Bigas, N. (2013). OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics, 29(18), 2238-2244.
Cancer Genome Atlas Research Network. (2013). Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. New England Journal of Medicine, 368(22), 2059-2074.
Martincorena, I., & Campbell, P. J. (2015). Somatic mutation in cancer and normal cells. Science, 349(6255), 1483-1489.
Rheinbay, E., Nielsen, M. M., Abascal, F., et al. (2020). Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature, 578(7793), 102-111.

This tutorial is part of the NGS101.com series on whole genome sequencing analysis. If this tutorial helped advance your research, please comment and share your experience to help other researchers! Subscribe to stay updated with our latest bioinformatics tutorials and resources.

Comments

3 responses to “How To Analyze Whole Genome Sequencing Data For Absolute Beginners Part 4: Visualizing and Interpreting Somatic Mutations”

Devon

September 29, 2025

Dear Dr. Lei

Thank you for providing another great tutorial.
Would you consider doing a tutorial on GWAS?

1. Lei
  
  September 29, 2025
  
  Hi Devon,
  
  Yes, it’s on my schedule. Stay tuned for updates.
  
  1. Devon
    
    September 30, 2025
    
    Thank you very much.