How to Analyze RNAseq Data for Absolute Beginners Part 8: Alternative Splicing Analysis

How to Analyze RNAseq Data for Absolute Beginners Part 8: Alternative Splicing Analysis

Introduction

Alternative splicing (AS) stands as one of the most fascinating mechanisms in molecular biology, allowing a single gene to produce multiple protein variants. This process dramatically expands the complexity of our proteome, enabling cells to fine-tune their protein repertoire in response to various conditions and developmental stages. Through RNA sequencing (RNA-seq), we can now observe these intricate splicing patterns with unprecedented detail.

In this tutorial, we’ll dive deep into the world of alternative splicing analysis. Whether you’re studying cancer biology, neuroscience, or basic cell biology, understanding alternative splicing patterns can provide crucial insights into your research questions.

Understanding the Biology of Alternative Splicing

Before we dive into the technical analysis, let’s understand what we’re actually measuring. Alternative splicing occurs through several distinct mechanisms:

  • Exon Skipping: The most common form in mammals, where entire exons can be included or excluded
  • Intron Retention: Where introns that are normally removed are kept in the mature mRNA
  • Alternative Splice Sites: Where different 5′ or 3′ splice sites create variations in exon length
  • Mutually Exclusive Exons: Where only one of two possible exons is included

These different patterns create tremendous protein diversity from a limited number of genes, explaining how organisms can generate complex proteomes from relatively few genes.

Applications in Medical Research

The impact of alternative splicing analysis extends far beyond basic research. Here’s how different fields are leveraging these insights:

Cancer Research

Alternative splicing has emerged as a critical player in cancer biology. Researchers have discovered that:

  • Cancer cells often exhibit unique splicing signatures
  • Specific splice variants can drive tumor progression
  • Splicing patterns can serve as diagnostic or prognostic markers
  • Some splicing events represent promising therapeutic targets

Neurological Disorders

The brain shows particularly complex splicing patterns:

  • Neural development relies heavily on precisely controlled splicing
  • Many neurological diseases involve splicing defects
  • Brain-specific splice variants often have unique functions
  • Therapeutic strategies increasingly target splicing mechanisms

Cardiovascular Research

The heart tissue demonstrates distinctive splicing patterns:

  • Cardiac development requires specific splicing programs
  • Heart disease often involves altered splicing
  • Therapeutic approaches may target splice variants
  • Splicing biomarkers aid in disease monitoring

Tools for Alternative Splicing Analysis

While many tools exist (above) for splicing analysis, we’ll focus on SplAdder, a robust and well-documented command line tool that excels at:

  • Building comprehensive splicing graphs
  • Detecting novel splicing events
  • Performing statistical comparisons
  • Generating publication-ready visualizations

SplAdder takes a given annotation and RNA-Seq read alignments in standardized formats, transforms the annotation into a splicing graph representation, augments the splicing graph with additional information extracted from the read data, extracts alternative splicing events from the graph and quantifies the events based on the alignment data. The quantified events can then be used for differential analysis and visualization.

Installation and Setup

First, let’s set up our analysis environment on a powerful Linux system as we discussed in our first tutorial:

# Create a clean conda environment - this isolates our work
conda create -n spladder_env python=3.9

# Activate the new environment
conda activate spladder_env

# Install SplAdder with all dependencies
pip install spladder

Analysis Workflow

Step 1: Building Splicing Graphs

The first step involves constructing splicing graphs from your RNA-seq data. Specify the paths of your BAM or GTF files if they are not in the current working directory. Here we use the read alignments (BAM files) from our previous tutorial as the input.

Note: BAM files are separated by commas with no spaces between them.

# Construct splicing graphs from BAM files
spladder build -o spladder_output \
               -b SRR28119110_trimmedAligned.sortedByCoord.out.bam,SRR28119111_trimmedAligned.sortedByCoord.out.bam,SRR28119112_trimmedAligned.sortedByCoord.out.bam,SRR28119113_trimmedAligned.sortedByCoord.out.bam \
               -a gencode.vM25.annotation.gtf \
               --set-mm-tag nM

This command performs several crucial steps:

  1. Reads your aligned RNA-seq data
  2. Constructs initial splicing graphs
  3. Augments these graphs with experimental evidence
  4. Quantifies detected events

The output directory “spladder_output” contains the following structures.

Step 2: Differential Splicing Analysis

Once we have our splicing graphs, we can compare conditions:

# Perform differential analysis
spladder test --conditionA SRR28119110_trimmedAligned.sortedByCoord.out.bam,SRR28119111_trimmedAligned.sortedByCoord.out.bam \
              --conditionB SRR28119112_trimmedAligned.sortedByCoord.out.bam,SRR28119113_trimmedAligned.sortedByCoord.out.bam \
              --outdir spladder_output \
              --labelA KRAS_SPIB \
              --labelB KRAS \
              --plot-format pdf \
              --parallel 8

This analysis:

  • Compares splicing patterns between conditions
  • Calculates statistical significance
  • Generates comprehensive reports
  • Produces visualizations

The results of this step are stored in the “testing_KRAS_SPIB_vs_KRAS” folder in the “spladder_output” directory.

Each tsv file contains a type of differential splicing event.

test_results_C3_<event_type>.tsv: These files contain the basic statistical testing results for the given alternative splicing event. They list all the events tested, including duplicates (if an event is associated with multiple genes).

test_results_C3_<event_type>.gene_unique.tsv: These files contain filtered statistical results, ensuring that each splicing event is linked to a unique gene. They remove duplicates to provide a clearer interpretation of splicing events associated with each gene.

test_results_extended_C3_<event_type>.tsv: Extended versions of the statistical results files, often including additional information or less stringent filtering criteria. These files may have more splicing events listed compared to the non-extended versions.

test_setup_C3_<event_type>.pickle: Pickle files that store the experimental setup and parameters used for testing the given alternative splicing event. These files can be used to reload the setup and rerun or extend the analysis.

Step 3: Visualization and Interpretation

Visualizing differential splicing events helps understand their biological context. You can also choose to visualize the splicing events in specific samples if your dataset doesn’t have comparing groups.

# Generate comprehensive visualizations
spladder viz --range gene ENSMUSG00000009471.4 \
             --track coverage,segments \
             KRAS_SPIB:SRR28119110_trimmedAligned.sortedByCoord.out.bam,SRR28119111_trimmedAligned.sortedByCoord.out.bam \
            KRAS:SRR28119112_trimmedAligned.sortedByCoord.out.bam,SRR28119113_trimmedAligned.sortedByCoord.out.bam \
             --track event any \
             --track splicegraph \
             -O Myod1_Splicing \
             -o spladder_output

This command generates a graph showing all the differential splicing events (specified by the “–track event any” option) identified between group “KRAS_SPIB” and group “KRAS” for the Myod1 gene. The gene ID for Myod1 gene can be found in the GTF file used in step 1.

Best Practices and Quality Control

Success in alternative splicing analysis requires careful attention to several key factors:

Sample Preparation and Sequencing

  • Use sufficient sequencing depth (≥50M reads per sample)
  • Ensure high RNA quality (RIN > 8)
  • Include adequate biological replicates

Analysis Parameters

  • Set appropriate coverage thresholds
  • Use proper statistical controls
  • Validate key findings with RT-PCR
  • Consider tissue-specific effects

Troubleshooting Guide

Bugs & Issues

  • For bugs and issues, contact the developer on the GitHub.

Advanced Tips

For more complex analyses:

Integrating with Other Data

  • Combine with protein structure information
  • Correlate with expression data
  • Include evolutionary conservation

Custom Analyses

  • Modify event detection parameters
  • Create custom visualization scripts
  • Export data for downstream analysis

Conclusion

Alternative splicing analysis provides crucial insights into gene regulation and disease mechanisms. By following this guide and being mindful of best practices, you can generate reliable and biologically meaningful results from your RNA-seq data.

References

  1. Fenn A, Tsoy O, Faro T, Rößler FLM, Dietrich A, Kersting J, Louadi Z, Lio CT, Völker U, Baumbach J, Kacprowski T, List M. Alternative splicing analysis benchmark with DICAST. NAR Genom Bioinform. 2023 May 30;5(2):lqad044. doi: 10.1093/nargab/lqad044. PMID: 37260511; PMCID: PMC10227362.
  2. Shen F, Hu C, Huang X, He H, Yang D, Zhao J, Yang X. Advances in alternative splicing identification: deep learning and pantranscriptome. Front Plant Sci. 2023 Sep 18;14:1232466. doi: 10.3389/fpls.2023.1232466. PMID: 37790793; PMCID: PMC10544900.
  3. SplAdder documentation and updates

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *