How to Analyze RNA-seq Data for Absolute Beginners Part 16-2: Fusion Gene Detection with FusionCatcher

Table of Contents

Introduction: Advanced Fusion Detection for Cancer Research

Building on our previous exploration of fusion gene detection with STAR-Fusion (Part 16), we now delve into FusionCatcher, a specialized tool that has become the gold standard for detecting somatic fusion genes in cancer samples. While both tools excel at fusion detection, FusionCatcher offers unique capabilities that make it indispensable for oncology research.

What Makes FusionCatcher Essential for Cancer Research?

FusionCatcher isn’t just another fusion detection tool – it’s a comprehensive system specifically designed for identifying cancer-relevant gene fusions. Here’s what sets it apart:

Exceptional Validation Rate: FusionCatcher boasts an excellent RT-PCR validation rate, meaning the fusions it identifies are more likely to be real and biologically relevant.

Somatic Focus: Unlike general-purpose fusion detectors, FusionCatcher specifically targets novel and known somatic fusion genes, translocations, and chimeras found in diseased samples.

Challenging Fusion Detection: The tool excels at detecting difficult-to-find fusions including IGH, CIC, DUX4, CRLF2, and TCF3 fusions that other tools might miss.

Understanding the Multi-Aligner Strategy

FusionCatcher’s power comes from its sophisticated multi-aligner approach, which combines three different alignment strategies:

BOWTIE Alignment: Relies on precise annotation matching, finding fusions when junction points align perfectly with known exon borders
BLAT Alignment: Detects fusions where junction points fall within exons or introns, even with incomplete annotations
STAR Alignment: Provides splice-aware sensitivity for complex fusion events

Think of it this way: If you’re trying to solve a complex puzzle, using one approach is like trying to complete it with only the corner pieces. FusionCatcher’s multi-aligner strategy is like having corner pieces, edge pieces, and interior pieces all working together – you get a much clearer picture of the complete fusion landscape.

Biological Intelligence Built-In

What truly distinguishes FusionCatcher is its integration of extensive biological knowledge:

False Positive Filtering: Uses databases of known fusions found in healthy samples to eliminate likely artifacts
Pseudogene Recognition: Automatically filters out gene-pseudogene fusions that often represent technical artifacts
Read-through Detection: Can optionally filter out adjacent gene read-throughs that aren’t true fusions
Oncogene Prioritization: Leverages known oncogene databases to highlight clinically relevant findings

Setting Up Your FusionCatcher Environment

Installing FusionCatcher

Let’s create a dedicated environment for FusionCatcher analysis:

#-----------------------------------------------
# STEP 1: Create FusionCatcher environment
#-----------------------------------------------

# Create a new conda environment for FusionCatcher
conda create -p ~/Env_FusionCatcher -y fusioncatcher

# Activate the environment
conda activate ~/Env_FusionCatcher

# Verify installation
fusioncatcher --help

Installation Note: The conda installation method is relatively new. If you encounter compatibility issues or missing package errors, refer to the alternative installation methods provided by the authors on their GitHub repository.

Building the Reference Database

FusionCatcher requires species-specific reference databases. Here’s how to set them up:

#-----------------------------------------------
# STEP 2: Build FusionCatcher reference databases
#-----------------------------------------------

# For human samples (most common in cancer research)
echo "Building human reference database..."
fusioncatcher-build -g homo_sapiens -o ~/fusioncatcher_Index_human

# For mouse samples (common in research models)
echo "Building mouse reference database..."
fusioncatcher-build -g mus_musculus -o ~/fusioncatcher_Index_mouse

# For rat samples (less common but available)
echo "Building rat reference database..."
fusioncatcher-build -g rattus_norvegicus -o ~/fusioncatcher_Index_rat

# Alternative: Download pre-built human database (faster option)
echo "Downloading pre-built human database as alternative..."
mkdir -p ~/fusioncatcher_Index_human_prebuilt
cd ~/fusioncatcher_Index_human_prebuilt

# Download the multi-part compressed database
wget http://sourceforge.net/projects/fusioncatcher/files/data/human_v102.tar.gz.aa
wget http://sourceforge.net/projects/fusioncatcher/files/data/human_v102.tar.gz.ab
wget http://sourceforge.net/projects/fusioncatcher/files/data/human_v102.tar.gz.ac
wget http://sourceforge.net/projects/fusioncatcher/files/data/human_v102.tar.gz.ad

# Extract the complete database
cat human_v102.tar.gz.* | tar xz

# Create a symbolic link for easy access
ln -s human_v102 current

echo "Database setup complete!"

Database Choice: Building from source gives you the latest annotations, while downloading pre-built databases is much faster. For most applications, the pre-built option is sufficient.

Preparing Your Dataset

Using Consistent Test Data

For direct comparison with our STAR-Fusion results from Part 16, we’ll analyze the same rhabdomyosarcoma dataset known to contain the PAX3-FOXO1 fusion:

#-----------------------------------------------
# STEP 3: Prepare analysis dataset
#-----------------------------------------------

# Create organized project directory
mkdir -p ~/FusionCatcher_Analysis/{raw_data,results}
cd ~/FusionCatcher_Analysis

# Navigate to data directory
cd raw_data

# Download the same fusion-positive dataset used in Part 16
echo "Downloading rhabdomyosarcoma dataset with known PAX3-FOXO1 fusion..."
fasterq-dump SRR30961741

# FusionCatcher expects specific naming conventions
echo "Renaming files for FusionCatcher compatibility..."
mv SRR30961741_1.fastq SRR30961741_R1.fastq
mv SRR30961741_2.fastq SRR30961741_R2.fastq

# Compress files to save storage space
echo "Compressing FASTQ files..."
gzip SRR30961741_R1.fastq SRR30961741_R2.fastq

echo "Dataset preparation complete!"

File Organization: FusionCatcher requires that all FASTQ files belonging to the same sample be placed in a single directory. This organization is crucial for proper processing.

Running FusionCatcher Analysis

Executing the Fusion Detection Pipeline

Now let’s run FusionCatcher on our prepared dataset:

#-----------------------------------------------
# STEP 4: Execute FusionCatcher analysis
#-----------------------------------------------

# Navigate to project root
cd ~/FusionCatcher_Analysis

# Run FusionCatcher with optimized parameters
echo "Starting FusionCatcher analysis..."
echo "This process will take 2-4 hours depending on your system..."

fusioncatcher \
    -d ~/fusioncatcher_Index_human/current/ \     # Reference database path
    -i ~/FusionCatcher_Analysis/raw_data \        # Input directory with FASTQ files
    -o ~/FusionCatcher_Analysis/results           # Output directory for results

# Alternative command if you have matched normal RNA-seq samples:
# fusioncatcher \
#     -d ~/fusioncatcher_Index_human/current/ \
#     -i ~/FusionCatcher_Analysis/raw_data \
#     -I /path/to/matched_normal_sample/ \       # Matched normal sample directory
#     -o ~/FusionCatcher_Analysis/results

echo "FusionCatcher analysis initiated!"
echo "Monitor progress in the log files within the results directory."

Normal Sample Filtering: By default, FusionCatcher uses a curated background list of fusion genes that have been previously identified in normal healthy samples to filter out likely false positives. If you have matched normal RNA-seq data from the same patient, you can provide it using the -I or –normal option to create a personalized background filter, which can further improve specificity by removing patient-specific germline fusions.

Important Note: Do not pre-trim your FASTQ files before running FusionCatcher. The tool performs its own intelligent quality filtering and trimming optimized specifically for fusion detection. Pre-trimming can actually reduce sensitivity by decreasing RNA fragment sizes, which are crucial for accurate fusion gene detection.

Performance Tip: FusionCatcher’s performance decreases dramatically when using non-default parameters. The default settings are optimized for the best balance of sensitivity and specificity.

Understanding the Analysis Process

While FusionCatcher runs, here’s what’s happening behind the scenes:

Automatic Quality Assessment: Identifies and handles adapter sequences
Smart Trimming: Preserves RNA fragment length for optimal fusion detection
Multi-Step Alignment: Uses BOWTIE, BLAT, and STAR in sequence
Junction Validation: Verifies fusion breakpoints with multiple evidence types
Biological Filtering: Applies extensive databases to remove false positives
Protein Analysis: Predicts functional consequences of detected fusions

Interpreting FusionCatcher Results

Key Output Files Explained

When analysis completes, you’ll find these essential files in your results directory:

#-----------------------------------------------
# Primary Results Files
#-----------------------------------------------

# Main fusion gene list (hg38 coordinates)
final-list_candidate_fusion_genes.txt

# Same results with hg19 coordinates for compatibility
final-list_candidate_fusion_genes.hg19.txt

# Executive summary of findings
summary_candidate_fusions.txt

# Detailed interpretation guide
final-list_candidate_fusion_genes.caption.md.txt

#-----------------------------------------------
# Supporting Evidence Files
#-----------------------------------------------

# Supporting reads from BOWTIE alignment
supporting-reads_gene-fusions_BOWTIE.zip

# Supporting reads from BLAT alignment
supporting-reads_gene-fusions_BLAT.zip

# Supporting reads from STAR alignment
supporting-reads_gene-fusions_STAR.zip

#-----------------------------------------------
# Quality Control and Metadata
#-----------------------------------------------

# Pathogen screening results
viruses_bacteria_phages.txt

# Analysis metadata and statistics
info.txt

# Complete analysis log
fusioncatcher.log

Understanding the Results Format

The main results file contains comprehensive information for each detected fusion:

Column	Description	Clinical Significance
Gene_1_symbol	5′ partner gene	Often contains regulatory elements
Gene_2_symbol	3′ partner gene	Usually contributes functional domains
Fusion_description	Chromosomal locations and details	Reveals structural rearrangement
Counts_of_common_mapping_reads	quality control metric	Expected to be 0
Spanning_pairs	Read pairs spanning the junction	Direct evidence of fusion transcript
Spanning_unique_reads	Unique spanning reads	Reduces PCR duplication bias
Longest_anchor_found	Maximum anchoring sequence length	Longer anchors indicate higher confidence
Fusion_sequence	Actual junction sequence	Shows precise breakpoint (marked with *)
Predicted_effect	Functional consequence prediction	In-frame fusions often have greater impact

Analyzing Expected Results

For our rhabdomyosarcoma dataset, you should observe:

PAX3–FOXO1: The diagnostic fusion for alveolar rhabdomyosarcoma

High supporting read count
In-frame fusion maintaining functional domains
Strong clinical significance

MARS–AVIL: Secondary fusion event

Lower support but still significant
May represent passenger mutation

Additional Candidates: Various other potential fusions requiring validation

final-list_candidate_fusion_genes.txt:

summary_candidate_fusions.txt:

Comparing STAR-Fusion vs FusionCatcher

When to Use Each Tool

Understanding the strengths of each approach helps you choose the right tool for your research:

Aspect	STAR-Fusion	FusionCatcher
Analysis Speed	30-60 minutes	2-4 hours
Sensitivity	High for well-annotated fusions	Superior for challenging fusions
False Positive Rate	Moderate filtering	Extensive filtering
Automation Level	Manual parameter tuning	Fully automated optimization
Protein Analysis	Requires FusionInspector	Built-in predictions
Best Applications	General RNA-seq studies	Cancer research focus
Memory Requirements	Moderate	Higher
Clinical Validation	Good RT-PCR validation	Excellent RT-PCR validation

Complementary Analysis Strategy

Rather than choosing one tool over the other, consider this integrated approach:

Primary Screening: Use FusionCatcher for comprehensive somatic fusion detection
Rapid Validation: Use STAR-Fusion for quick confirmation of key findings
Cross-Validation: Compare results between tools for high-confidence calls
Clinical Focus: Prioritize FusionCatcher results for cancer-relevant fusions

Advanced Analysis and Troubleshooting

Data Quality Requirements

Optimal Input Specifications:

Read Type: Paired-end RNA-seq preferred for maximum sensitivity
Sequencing Depth: Minimum 20-30 million read pairs for adequate coverage
Read Length: ≥75bp recommended for reliable junction spanning
Fragment Size: Ideally >300bp to maintain detection sensitivity

Quality Considerations:
FusionCatcher performs integrated quality control and trimming, which is crucial because it preserves RNA fragment length – a critical factor for fusion gene detection.

Validation Strategies

RT-PCR Confirmation Design:

Use fusion junction sequences from FusionCatcher output
Design primers spanning the breakpoint
Include positive and negative controls
Validate in multiple samples when possible

Functional Analysis Approaches:

Examine protein domain retention in fusion products
Assess predicted functional consequences
Correlate with gene expression changes
Consider therapeutic targeting implications

Best Practices for Clinical Applications

Quality Assurance Workflow

Input Validation: Ensure high-quality RNA-seq data with adequate depth
Reference Standards: Include known positive and negative controls when possible
Cross-Platform Validation: Compare results across multiple detection methods
Clinical Correlation: Connect findings to patient phenotype and treatment response

Conclusion: Mastering Comprehensive Fusion Detection

The Power of Multi-Tool Expertise

By combining STAR-Fusion and FusionCatcher in your analytical toolkit, you’ve developed a comprehensive approach to fusion detection that addresses diverse research needs:

STAR-Fusion: Rapid screening and general-purpose fusion detection
FusionCatcher: Deep, cancer-focused analysis with superior validation rates
Integrated Approach: Cross-validation and confidence assessment

Future Directions

Consider expanding your fusion analysis capabilities with:

Functional Validation: Learning experimental techniques for fusion confirmation
Protein Structure Analysis: Understanding fusion protein domains and interactions
Clinical Database Integration: Connecting findings to treatment response databases
Therapeutic Targeting: Identifying druggable fusion proteins and pathways

Final Recommendations

For Cancer Research: FusionCatcher’s excellent RT-PCR validation rate and somatic focus make it essential for oncology applications.

For General RNA-seq: STAR-Fusion provides efficient screening, while FusionCatcher offers comprehensive validation.

For Clinical Applications: The combination of both tools provides the confidence needed for patient care decisions.

Remember that fusion gene detection is both an art and a science. The biological knowledge integrated into FusionCatcher, combined with your growing expertise in result interpretation, positions you to make meaningful contributions to cancer research and precision medicine.

References

Nicorici, D., et al. “FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data.” bioRxiv (2014). DOI:10.1101/011650
Haas, B.J., et al. “Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods.” Genome Biology 20, 213 (2019).
Kumar, S., et al. “Passenger mutations in more than 2,500 cancer genomes: overall molecular functional impact and consequences.” Cell 173, 371-385 (2018).
Bruford, E.A., Antonescu, C.R., Carroll, A.J. et al. HUGO Gene Nomenclature Committee (HGNC) recommendations for the designation of gene fusions. Leukemia 35, 3040–3043 (2021).

Essential Resources

FusionCatcher GitHub Repository: Complete documentation and updates
FusionCatcher Manual: Detailed usage instructions