How To Analyze Whole Genome Sequencing Data For Absolute Beginners Part 2B: Unmatched Sample Mutation Calling Strategies

Table of Contents

Introduction: Real-World Mutation Calling Challenges

Welcome to Part 2B of our somatic mutation analysis series! In Part 2A, we learned the gold standard approach using matched tumor-normal pairs. However, in real-world scenarios, you often face situations where matched normal samples aren’t available.

Common Unmatched Sample Scenarios

Clinical Archives: Historical tumor samples without corresponding normal tissue
Population Studies: Large cohorts where matched normals are cost-prohibitive
Tissue Constraints: Limited biopsy material preventing normal collection
Research Collections: Existing datasets with unmatched sample combinations

Why This Tutorial Matters

Understanding unmatched sample analysis is crucial because:

90% of archival samples lack matched controls
Population studies often use shared normal references
Cost considerations drive many research designs
Clinical applications sometimes require tumor-only analysis

Analysis Strategies We’ll Cover

Pooled Normal Approach – Combining multiple normals into a super-reference
Custom Panel of Normals – Creating study-specific artifact databases
Tumor-Only Analysis – Working without any normal controls

Each strategy has specific use cases, advantages, and limitations that we’ll explore in detail.

Strategy 1: Pooled Normal Approach

Best for: 5-20 tumor samples with 3-10 available normal samples
Example: 10 tumor samples with 5 normal samples from different patients

Understanding the Pooled Normal Concept

The pooled normal approach creates a “super normal” by combining multiple normal samples. This strategy provides several advantages over individual normal samples:

Higher coverage depth from combined reads
Better germline variant representation across the population
Reduced single-sample bias from any individual normal
Consistent reference for all tumor comparisons

Setting Up for Pooled Normal Analysis

We’ll create a new analysis directory specifically for the pooled normal approach while maintaining our organized project structure.

#-----------------------------------------------
# STEP 1: Prepare environment for pooled normal analysis
#-----------------------------------------------

# Activate the WGS data analysis environment from Part 1
# If you haven't completed Part 1, please follow that tutorial first
conda activate wgs_analysis

# Navigate to our analysis directory (continuing from Part 2A)
cd ~/somatic_analysis_matched

# Create directory for pooled normal analysis
mkdir -p pooled_normal_analysis
cd pooled_normal_analysis

# Create subdirectories
mkdir -p {pooled_bam,raw_calls,filtered_calls,contamination,converted_tables,maf_files}

echo "Pooled normal analysis environment ready!"

Creating the Pooled Normal BAM

This step combines multiple normal BAM files into a single, high-coverage normal reference. The merged BAM will have significantly higher depth and better representation of population-level variants.

#-----------------------------------------------
# STEP 2: Create pooled normal BAM file
#-----------------------------------------------

# List available normal samples (adjust based on your available samples)
# For this example, we'll use normal1 and normal2
INPUT_DIR="~/somatic_analysis_matched/input_data"

# Merge normal BAM files using samtools
samtools merge \
    -@ 8 \                                    # Use 8 threads for faster processing
    pooled_bam/pooled_normal.bam \           # Output pooled BAM
    ${INPUT_DIR}/normal1_recalibrated.bam \
    ${INPUT_DIR}/normal2_recalibrated.bam

# Index the pooled normal BAM
samtools index pooled_bam/pooled_normal.bam

echo "Pooled normal BAM created successfully!"

Running Mutect2 with Pooled Normal

The analysis proceeds similarly to matched tumor-normal calling, but using our pooled normal as the control sample. This approach maintains the specificity benefits of having a normal control while maximizing the available normal tissue data.

#-----------------------------------------------
# STEP 3: Run Mutect2 using pooled normal as control
#-----------------------------------------------

# Set up reference files (same as Part 2A)
REFERENCE="~/references/somatic_resources/Homo_sapiens_assembly38.fasta"
GERMLINE_RESOURCE="~/references/somatic_resources/af-only-gnomad.hg38.vcf.gz"
PON="~/references/somatic_resources/1000g_pon.hg38.vcf.gz"

# Run Mutect2 comparing tumor1 against pooled normal
gatk Mutect2 \
    -R $REFERENCE \
    -I ${INPUT_DIR}/tumor1_recalibrated.bam \       # Tumor sample
    -I pooled_bam/pooled_normal.bam \        # Pooled normal control
    -tumor tumor1 \                          # Tumor sample name
    -normal pooled_normal \                  # Pooled normal sample name
    --germline-resource $GERMLINE_RESOURCE \ # Population frequencies
    --panel-of-normals $PON \                # Technical artifact filter
    --f1r2-tar-gz raw_calls/tumor1_pooled_f1r2.tar.gz \  # Orientation data
    -O raw_calls/tumor1_pooled_raw.vcf.gz \  # Output raw calls
    --native-pair-hmm-threads 8 \            # Use 8 CPU threads
    --max-reads-per-alignment-start 50       # Limit high-coverage regions

echo "Mutect2 with pooled normal complete!"

Filtering for Pooled Normal Analysis

Since the pooled normal isn’t perfectly matched to our tumor sample, we apply slightly more stringent filtering criteria compared to matched analysis to compensate for potential population-level differences.

#-----------------------------------------------
# STEP 4: Apply filtering optimized for pooled normal approach
#-----------------------------------------------

# Generate contamination estimates (pooled normal approach)
COMMON_VARIANTS="~/references/somatic_resources/small_exac_common_3.hg38.vcf.gz"

# Pileup summary for tumor
gatk GetPileupSummaries \
    -I ${INPUT_DIR}/tumor1_recalibrated.bam \
    -V $COMMON_VARIANTS \
    -L $COMMON_VARIANTS \
    -O contamination/tumor1_pileups.table

# Pileup summary for pooled normal
gatk GetPileupSummaries \
    -I pooled_bam/pooled_normal.bam \
    -V $COMMON_VARIANTS \
    -L $COMMON_VARIANTS \
    -O contamination/pooled_normal_pileups.table

# Calculate contamination
gatk CalculateContamination \
    -I contamination/tumor1_pileups.table \
    -matched contamination/pooled_normal_pileups.table \
    -O contamination/tumor1_pooled_contamination.table

# Learn read orientation model
gatk LearnReadOrientationModel \
    -I raw_calls/tumor1_pooled_f1r2.tar.gz \
    -O raw_calls/tumor1_pooled_orientation_model.tar.gz

# Apply FilterMutectCalls
gatk FilterMutectCalls \
    -R $REFERENCE \
    -V raw_calls/tumor1_pooled_raw.vcf.gz \
    --contamination-table contamination/tumor1_pooled_contamination.table \
    --ob-priors raw_calls/tumor1_pooled_orientation_model.tar.gz \
    -O filtered_calls/tumor1_pooled_filtered.vcf.gz

# Extract PASS variants
bcftools view -f PASS \
    filtered_calls/tumor1_pooled_filtered.vcf.gz \
    -O z \
    -o filtered_calls/tumor1_pooled_pass.vcf.gz

# Apply moderate quality filters (less stringent than matched pairs)
# Pooled normals provide good but not perfect germline filtering
bcftools filter \
    -i 'FORMAT/AF[0:0] >= 0.06 && FORMAT/DP[0:0] >= 12 && INFO/TLOD >= 6.3 && (FORMAT/AF[0:1] <= 0.03 || FORMAT/AF[0:1] == ".")' \
    filtered_calls/tumor1_pooled_pass.vcf.gz \
    -O z \
    -o filtered_calls/tumor1_pooled_high_confidence.vcf.gz

# Index final VCF
bcftools index -t filtered_calls/tumor1_pooled_high_confidence.vcf.gz

echo "Quality filter criteria for pooled normal analysis:"
echo "  Tumor AF ≥ 6% (slightly higher than matched)"
echo "  Tumor depth ≥ 12 reads (higher confidence threshold)"
echo "  TLOD ≥ 6.3 (same statistical evidence)"
echo "  Normal AF ≤ 3% (strict germline filtering)"

# Generate final statistics
hc_variants=$(bcftools view -H filtered_calls/tumor1_pooled_high_confidence.vcf.gz | wc -l)
echo "High-confidence variants with pooled normal: $hc_variants"

Strategy 2: Custom Panel of Normals Approach

Best for: Studies with 10+ normal samples available
Goal: Maximum technical artifact removal using study-specific patterns

Understanding Panel of Normals

A Panel of Normals (PON) is a database of technical artifacts observed across many normal samples. Creating a custom PON from your study samples helps remove study-specific artifacts including:

Systematic sequencing errors that appear across samples
Mapping artifacts in repetitive genomic regions
PCR amplification biases from library preparation
Platform-specific errors from sequencing technology

Creating Custom Panel of Normals

The process involves running Mutect2 in tumor-only mode on normal samples to catalog all variants (including artifacts), then combining these into a comprehensive artifact database.

#-----------------------------------------------
# STEP 5: Create custom Panel of Normals from available samples
#-----------------------------------------------

# Create directory for custom PON analysis
mkdir -p custom_pon_analysis
cd custom_pon_analysis
mkdir -p {pon_creation,raw_calls,filtered_calls,converted_tables}

# Step 1: Run Mutect2 in tumor-only mode on each normal sample
# This identifies all variants (including artifacts) in each normal
for normal in normal1 normal2; do
    echo "Processing ${normal} for PON..."

    # Run Mutect2 in tumor-only mode on normal sample
    gatk Mutect2 \
        -R $REFERENCE \
        -I ${INPUT_DIR}/${normal}_recalibrated.bam \
        --max-mnp-distance 0 \              # Disable complex variant calling
        -O pon_creation/${normal}_for_pon.vcf.gz

    echo "${normal} processed for PON"
done

# Step 2: Create the Panel of Normals database
gatk CreateSomaticPanelOfNormals \
    -vcfs pon_creation/normal1_for_pon.vcf.gz \    # Normal sample 1
    -vcfs pon_creation/normal2_for_pon.vcf.gz \    # Normal sample 2
    -O pon_creation/custom_pon.vcf.gz              # Output custom PON

echo "Custom Panel of Normals created!"

# Display PON statistics
pon_sites=$(bcftools view -H pon_creation/custom_pon.vcf.gz | wc -l)
echo "Custom PON contains $pon_sites artifact sites"

# Note: In production, use 40+ normals for effective PON
echo "Note: This PON uses only 2 normals (demo). Production PONs need 40+ samples."

Running Tumor-Only Analysis with Custom PON

With our custom PON created, we can now run tumor-only analysis with enhanced artifact filtering specific to our study’s technical characteristics.

#-----------------------------------------------
# STEP 6: Run tumor-only analysis with custom PON
#-----------------------------------------------

echo "Running tumor-only analysis with custom Panel of Normals..."

# Run Mutect2 in tumor-only mode using our custom PON
gatk Mutect2 \
    -R $REFERENCE \
    -I ${INPUT_DIR}/tumor1_recalibrated.bam \       # Tumor sample only
    -tumor tumor1 \                          # Tumor sample name
    --germline-resource $GERMLINE_RESOURCE \ # Population frequencies
    --panel-of-normals pon_creation/custom_pon.vcf.gz \  # Our custom PON
    --f1r2-tar-gz raw_calls/tumor1_custom_pon_f1r2.tar.gz \
    -O raw_calls/tumor1_custom_pon_raw.vcf.gz \
    --native-pair-hmm-threads 8

echo "Tumor-only calling with custom PON complete!"

# Generate statistics
custom_pon_variants=$(bcftools view -H raw_calls/tumor1_custom_pon_raw.vcf.gz | wc -l)
echo "Raw variants with custom PON: $custom_pon_variants"

Filtering for Custom PON Analysis

The custom PON provides good technical artifact removal, allowing us to use standard filtering criteria while maintaining confidence in our results.

#-----------------------------------------------
# STEP 7: Apply filtering for custom PON analysis
#-----------------------------------------------

echo "Applying filtering for custom PON approach..."

# Contamination analysis (tumor-only mode)
gatk GetPileupSummaries \
    -I ${INPUT_DIR}/tumor1_recalibrated.bam \
    -V $COMMON_VARIANTS \
    -L $COMMON_VARIANTS \
    -O contamination/tumor1_custom_pon_pileups.table

gatk CalculateContamination \
    -I contamination/tumor1_custom_pon_pileups.table \
    -O contamination/tumor1_custom_pon_contamination.table

# Learn read orientation model
gatk LearnReadOrientationModel \
    -I raw_calls/tumor1_custom_pon_f1r2.tar.gz \
    -O raw_calls/tumor1_custom_pon_orientation_model.tar.gz

# Apply FilterMutectCalls
gatk FilterMutectCalls \
    -R $REFERENCE \
    -V raw_calls/tumor1_custom_pon_raw.vcf.gz \
    --contamination-table contamination/tumor1_custom_pon_contamination.table \
    --ob-priors raw_calls/tumor1_custom_pon_orientation_model.tar.gz \
    -O filtered_calls/tumor1_custom_pon_filtered.vcf.gz

# Extract PASS variants
bcftools view -f PASS \
    filtered_calls/tumor1_custom_pon_filtered.vcf.gz \
    -O z \
    -o filtered_calls/tumor1_custom_pon_pass.vcf.gz

# Apply standard quality filters (custom PON provides good artifact removal)
bcftools filter \
    -i 'FORMAT/AF[0:0] >= 0.05 && FORMAT/DP[0:0] >= 10 && INFO/TLOD >= 6.3' \
    filtered_calls/tumor1_custom_pon_pass.vcf.gz \
    -O z \
    -o filtered_calls/tumor1_custom_pon_high_confidence.vcf.gz

# Index final VCF
bcftools index -t filtered_calls/tumor1_custom_pon_high_confidence.vcf.gz

# Final statistics
custom_pon_hc=$(bcftools view -H filtered_calls/tumor1_custom_pon_high_confidence.vcf.gz | wc -l)
echo "High-confidence variants with custom PON: $custom_pon_hc"

# Return to main analysis directory
cd ~/somatic_analysis_matched

Strategy 3: Tumor-Only Analysis

Best for: Archival samples with no available normal controls
Limitation: Higher false positive rate, requires aggressive filtering

When to Use Tumor-Only Analysis

This approach should be used judiciously, as it has the highest risk of false positives due to the inability to distinguish somatic mutations from germline variants directly.

Appropriate for:

Historical/archival samples
Rapid screening studies
Samples with no available normal tissue
Cost-constrained large population studies

Use with caution for:

Clinical decision-making
Low-frequency mutation detection
Publication-quality research requiring high specificity

Running Tumor-Only Analysis

Without any normal control, we rely heavily on population databases and Panel of Normals to filter germline variants and technical artifacts.

#-----------------------------------------------
# STEP 8: Tumor-only analysis without normal controls
#-----------------------------------------------

# Create directory for tumor-only analysis
mkdir -p tumor_only_analysis
cd tumor_only_analysis
mkdir -p {raw_calls,filtered_calls,contamination,converted_tables}

# Run Mutect2 in tumor-only mode
gatk Mutect2 \
    -R $REFERENCE \
    -I ${INPUT_DIR}/tumor1_recalibrated.bam \       # Tumor sample only
    -tumor tumor1 \                          # Tumor sample name (no normal)
    --germline-resource $GERMLINE_RESOURCE \ # Critical for germline filtering
    --panel-of-normals $PON \                # Use public PON for artifact removal
    --f1r2-tar-gz raw_calls/tumor1_only_f1r2.tar.gz \
    -O raw_calls/tumor1_only_raw.vcf.gz \    # Output raw calls
    --native-pair-hmm-threads 8 \
    --max-mnp-distance 0                     # Disable complex variants

echo "Tumor-only Mutect2 calling complete!"

# Generate statistics
tumor_only_variants=$(bcftools view -H raw_calls/tumor1_only_raw.vcf.gz | wc -l)
echo "Raw tumor-only variants: $tumor_only_variants"

Aggressive Filtering for Tumor-Only

To compensate for the lack of a normal control, we apply very stringent filtering criteria. This reduces sensitivity but maintains acceptable specificity for most applications.

#-----------------------------------------------
# STEP 9: Apply aggressive filtering for tumor-only analysis
#-----------------------------------------------

echo "Applying aggressive filtering for tumor-only analysis..."

# Contamination analysis (tumor-only - less reliable)
gatk GetPileupSummaries \
    -I ${INPUT_DIR}/tumor1_recalibrated.bam \
    -V $COMMON_VARIANTS \
    -L $COMMON_VARIANTS \
    -O contamination/tumor1_only_pileups.table

gatk CalculateContamination \
    -I contamination/tumor1_only_pileups.table \
    -O contamination/tumor1_only_contamination.table

# Learn read orientation model
gatk LearnReadOrientationModel \
    -I raw_calls/tumor1_only_f1r2.tar.gz \
    -O raw_calls/tumor1_only_orientation_model.tar.gz

# Apply FilterMutectCalls
gatk FilterMutectCalls \
    -R $REFERENCE \
    -V raw_calls/tumor1_only_raw.vcf.gz \
    --contamination-table contamination/tumor1_only_contamination.table \
    --ob-priors raw_calls/tumor1_only_orientation_model.tar.gz \
    -O filtered_calls/tumor1_only_filtered.vcf.gz

# Extract PASS variants
bcftools view -f PASS \
    filtered_calls/tumor1_only_filtered.vcf.gz \
    -O z \
    -o filtered_calls/tumor1_only_pass.vcf.gz

# Apply very stringent quality filters for tumor-only analysis
# Higher thresholds compensate for lack of normal sample
bcftools filter \
    -i 'FORMAT/AF[0:0] >= 0.10 && FORMAT/DP[0:0] >= 20 && INFO/TLOD >= 10.0 && INFO/POPAF < 0.001' \
    filtered_calls/tumor1_only_pass.vcf.gz \
    -O z \
    -o filtered_calls/tumor1_only_high_confidence.vcf.gz

# Index final VCF
bcftools index -t filtered_calls/tumor1_only_high_confidence.vcf.gz

echo "Aggressive filter criteria for tumor-only analysis:"
echo "  Tumor AF ≥ 10% (high frequency threshold)"
echo "  Tumor depth ≥ 20 reads (high confidence requirement)"
echo "  TLOD ≥ 10.0 (very strong statistical evidence)"
echo "  Population AF < 0.1% (aggressive germline filtering)"

# Final statistics
tumor_only_hc=$(bcftools view -H filtered_calls/tumor1_only_high_confidence.vcf.gz | wc -l)
echo "High-confidence tumor-only variants: $tumor_only_hc"

# Return to main analysis directory
cd ~/somatic_analysis_matched

echo "Tumor-only analysis complete!"

Converting Results to Analysis-Ready Formats

Each analysis strategy produces VCF files that need conversion to human-readable formats. The conversion process is identical to Part 2A, using GATK’s VariantsToTable for comprehensive data extraction.

#-----------------------------------------------
# STEP 10: Convert all strategy results to tables and MAF format
#-----------------------------------------------

# Function to convert VCF to table and MAF
convert_results() {
    local strategy=$1
    local vcf_path=$2
    local output_prefix=$3

    echo "Converting $strategy results..."

    # Convert to human-readable table
    gatk VariantsToTable \
        -V $vcf_path \
        -F CHROM -F POS -F ID -F REF -F ALT -F QUAL -F FILTER \
        -F TLOD -F NLOD -F ECNT \
        -GF GT -GF AD -GF AF -GF DP \
        -O ${output_prefix}.tsv

    # Count mutations
    mutation_count=$(tail -n +2 ${output_prefix}.tsv | wc -l)
    echo "$strategy: $mutation_count high-confidence mutations"
}

# Convert pooled normal results
convert_results "Pooled Normal" \
    "pooled_normal_analysis/filtered_calls/tumor1_pooled_high_confidence.vcf.gz" \
    "pooled_normal_analysis/converted_tables/tumor1_pooled_mutations"

# Convert custom PON results
convert_results "Custom PON" \
    "custom_pon_analysis/filtered_calls/tumor1_custom_pon_high_confidence.vcf.gz" \
    "custom_pon_analysis/converted_tables/tumor1_custom_pon_mutations"

# Convert tumor-only results
convert_results "Tumor-Only" \
    "tumor_only_analysis/filtered_calls/tumor1_only_high_confidence.vcf.gz" \
    "tumor_only_analysis/converted_tables/tumor1_only_mutations"

echo "All results converted to analysis-ready formats!"

Quality Assessment and Validation Guidelines

Expected Variant Counts (WGS, hg38)

Understanding typical mutation counts helps assess the quality of your analysis:

Matched analysis: 500-5,000 high-confidence mutations
Pooled normal: 400-4,000 high-confidence mutations
Custom PON: 800-8,000 high-confidence mutations
Tumor-only: 100-1,000 high-confidence mutations (after stringent filtering)

Quality Control Red Flags

Monitor these indicators that suggest potential issues:

Too few mutations: <100 in any strategy (possible over-filtering)
Too many mutations: >10,000 in matched analysis (possible artifacts)
High contamination: >5% cross-sample contamination
Low TLOD scores: Median TLOD <10 suggests poor quality

Key Quality Metrics

Ti/Tv ratio: Should be 2.0-3.0 for most cancer types
VAF distribution: Should show expected clonal patterns
Chromosome distribution: Should be roughly proportional to chromosome size

Validation Strategy by Analysis Type

Matched Analysis:

Validation rate: 5-10% of mutations
Focus: Clinical actionable variants, novel findings
Methods: Sanger sequencing, digital PCR

Pooled Normal Analysis:

Validation rate: 10-15% of mutations
Focus: Low-frequency variants, recurrent mutations
Methods: Sanger sequencing, amplicon sequencing

Custom PON Analysis:

Validation rate: 15-20% of mutations
Focus: All clinical variants, suspicious patterns
Methods: Multiple orthogonal methods

Tumor-Only Analysis:

Validation rate: 25-50% of mutations
Focus: All reported variants if used clinically
Methods: Comprehensive validation panel

Best Practices and Troubleshooting

Strategy Selection Guidelines

Available Samples	Recommended Strategy	Expected Specificity
Matched tumor-normal	Part 2A approach	Highest (>95%)
3-10 normals	Pooled normal	High (>90%)
10+ normals	Custom PON	High (>85%)
No normals	Tumor-only	Moderate (>70%)

Common Issues and Solutions

Low Mutation Counts:

Check filtering parameters are appropriate for your strategy
Verify tumor purity and sample quality
Consider tumor type-specific mutation rates

High False Positive Rate:

Increase filtering stringency
Validate suspicious patterns with orthogonal methods
Consider creating study-specific PON

Inconsistent Results Across Strategies:

Expected – different strategies have different sensitivity/specificity trade-offs
Focus on mutations consistently called across multiple strategies
Use matched analysis as gold standard when available

Computational Considerations

Memory requirements: Tumor-only < Custom PON < Pooled normal < Matched
Processing time: Similar across all strategies
Storage needs: Plan for intermediate files and multiple strategy outputs

Conclusion

You’ve now mastered the complete spectrum of somatic mutation calling strategies, from the gold standard matched tumor-normal approach in Part 2A to the practical alternatives when matched normals aren’t available. Each strategy serves specific research scenarios:

Matched tumor-normal remains the gold standard for clinical applications and high-impact research. Pooled normal approaches provide an excellent balance of specificity and practicality for medium-scale studies. Custom Panel of Normals strategies excel when you have sufficient normal samples to create study-specific artifact databases. Tumor-only analysis serves as a last resort for archival samples, requiring careful validation.

Key Takeaways

Strategy selection should be based on available samples and required specificity
Quality control is critical for all approaches, with increased importance for unmatched strategies
Validation rates should increase as you move away from matched analysis
Filtering stringency must be adjusted based on the analysis strategy

Your Somatic Mutation Analysis Journey

With Parts 2A and 2B complete, you now have professional-level competency in somatic mutation detection. You can confidently:

Choose appropriate analysis strategies based on available samples
Execute multiple mutation calling approaches
Apply strategy-specific quality control measures
Generate publication-ready mutation datasets

References

Cibulskis, K., et al. (2013). Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnology, 31(3), 213-219. doi:10.1038/nbt.2514
Benjamin, D., et al. (2019). Calling somatic SNVs and indels with Mutect2. bioRxiv. doi:10.1101/861054
Ellrott, K., et al. (2018). Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Systems, 6(3), 271-281. doi:10.1016/j.cels.2018.03.002
Fang, L. T., et al. (2021). Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nature Biotechnology, 39(9), 1151-1160. doi:10.1038/s41587-021-00993-6
GATK Best Practices Documentation (2023). Somatic short variant discovery (SNVs + Indels). Broad Institute. https://gatk.broadinstitute.org/hc/en-us/articles/360035894731
Chen, Z., et al. (2022). A survey of somatic mutation calling from next-generation sequencing data. Computational and Structural Biotechnology Journal, 20, 892-902. doi:10.1016/j.csbj.2022.02.013
Krøigård, A. B., et al. (2016). Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLOS ONE, 11(3), e0151664. doi:10.1371/journal.pone.0151664
Alioto, T. S., et al. (2015). A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nature Communications, 6, 10001. doi:10.1038/ncomms10001
Nishioka, M., et al. Somatic mutations in the human brain: implications for psychiatric research. Mol Psychiatry 24, 839–856 (2019). https://doi.org/10.1038/s41380-018-0129-y

This tutorial is part of the NGS101.com series on whole genome sequencing analysis. If this tutorial helped advance your research, please comment and share your experience to help other researchers! Subscribe to stay updated with our latest bioinformatics tutorials and resources.

Keywords: unmatched samples, pooled normal, Panel of Normals, tumor-only analysis, somatic mutations, GATK Mutect2, cancer genomics, population studies, archival samples, mutation calling strategies, bioinformatics