How to Analyze RNAseq Data for Absolute Beginners Part 13: Circular RNAseq Analysis

How to Analyze RNAseq Data for Absolute Beginners Part 13: Circular RNAseq Analysis

By

Lei

Understanding the Biology of Circular RNAs

The Nature of Circular RNAs

Circular RNAs (circRNAs) represent one of molecular biology’s most fascinating discoveries. Unlike the linear RNA molecules that dominated our understanding of gene expression for decades, circRNAs form continuous loops through a unique process called back-splicing. In this process, a downstream 5′ splice site connects to an upstream 3′ splice site, creating a covalently closed circle that defies our traditional understanding of RNA processing.

This unusual structure gives circRNAs remarkable properties. Without free ends, they’re resistant to the exonucleases that typically degrade linear RNAs. This stability isn’t just a curious feature – it’s a fundamental property that cells have evolved to exploit for long-term gene regulation and cellular memory.

The Biological Significance of CircRNAs

The discovery of widespread circRNA expression has revolutionized our understanding of gene regulation. These molecules serve multiple functions that we’re only beginning to understand:

  1. MicroRNA Regulation: Many circRNAs act as molecular sponges, binding and sequestering microRNAs to fine-tune gene expression. This mechanism allows cells to create complex regulatory networks where circRNAs compete with messenger RNAs for microRNA binding.
  2. Protein Interactions: Some circRNAs serve as scaffolds, bringing proteins together into functional complexes. This role is particularly important in cellular signaling and transcriptional regulation.
  3. Protein Coding: Breaking with traditional views of non-coding RNAs, some circRNAs can actually be translated into proteins. These peptides often have unique functions distinct from those produced by canonical linear mRNAs.

CircRNAs in Disease

The stability and tissue-specific expression of circRNAs make them particularly relevant to disease processes:

Cancer Biology:
In cancer, circRNAs often show dramatic changes in expression. Some act as oncogenes by sponging tumor-suppressor microRNAs, while others function as tumor suppressors. Their stable presence in blood makes them promising biomarkers for cancer diagnosis and monitoring.

Neurological Disorders:
The brain expresses an exceptionally diverse array of circRNAs. In conditions like Alzheimer’s disease, specific circRNAs show altered expression patterns that may contribute to neurodegeneration. Understanding these changes could lead to new therapeutic strategies.

Cardiovascular Disease:
CircRNAs play crucial roles in heart development and function. During cardiac stress or injury, certain circRNAs change their expression patterns, suggesting potential therapeutic targets for heart disease.

Setting Up Your Analysis Environment

Building upon our experience from previous RNA-seq analysis tutorials in this series, we’ll expand our bioinformatics environment to include specialized tools for circular RNA analysis. If you haven’t yet set up a basic RNA-seq environment, you may want to review our earlier tutorial on RNA-seq basics before proceeding.

# Activate our RNA-seq environment
conda activate rnaseq_env

# Install the core analysis toolkit
conda install -c bioconda circexplorer2 -y

Preparing Annotation Files and Genome Index

We’ll need gene annotation files and genome index for our analysis.

# Download human reference files (mm10 for mouse)
fetch_ucsc.py hg38 ref hg38_ref.txt    # RefSeq annotations
fetch_ucsc.py hg38 kg hg38_kg.txt      # KnownGenes annotations
fetch_ucsc.py hg38 fa hg38.fa          # Reference genome

# Download STAR Index components for human genome from refgenie
wget http://awspds.refgenie.databio.org/refgenomes.databio.org/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/star_index__default/chrLength.txt && \
wget http://awspds.refgenie.databio.org/refgenomes.databio.org/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/star_index__default/chrName.txt && \
wget http://awspds.refgenie.databio.org/refgenomes.databio.org/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/star_index__default/chrNameLength.txt && \
wget http://awspds.refgenie.databio.org/refgenomes.databio.org/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/star_index__default/chrStart.txt && \
wget http://awspds.refgenie.databio.org/refgenomes.databio.org/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/star_index__default/Genome && \
wget http://awspds.refgenie.databio.org/refgenomes.databio.org/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/star_index__default/genomeParameters.txt && \
wget http://awspds.refgenie.databio.org/refgenomes.databio.org/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/star_index__default/SA && \
wget http://awspds.refgenie.databio.org/refgenomes.databio.org/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/star_index__default/SAindex

Circular RNA Analysis with CIRCexplorer2

Step-by-Step Analysis Protocol

  1. Quality Control and Adapter Trimming
trim_galore --fastqc --paired --cores 20 \
    ~/raw/Sample1_L001_R1_001.fastq.gz \
    ~/raw/Sample1_L001_R2_001.fastq.gz \
    -o ~/Trimmed/Sample1/
  1. Genome Alignment with STAR
STAR --genomeDir ~/Genome_Index/STAR_GRCH38/ \
    --runThreadN 20 \
    --readFilesIn ~/Trimmed/Sample1/Sample1_L001_R1_001_val_1.fq.gz \
                  ~/Trimmed/Sample1/Sample1_L001_R1_001_val_2.fq.gz \
    --chimSegmentMin 10 \
    --readFilesCommand zcat \
    --outFileNamePrefix ~/aligned/Sample1/Sample1_L001_R1_001_trimmed
  1. Circular RNA Detection and Annotation
fast_circ.py parse \
    -r ~/ref/hg38/hg38_kg.txt \
    -g ~/ref/hg38/hg38.fa \
    -t STAR \
    -o ~/aligned/Sample1/fast_circ_parse \
    ~/aligned/Sample1/Sample1_L001_R1_001_trimmedChimeric.out.junction

Make sure to repeat the process for all your samples.

The resulted circularRNA_known.txt file contains the following information:

The annotation of each columns are shown below:

Understanding Your Analysis Options: CIRCexplorer2 and CIRCexplorer3

While both CIRCexplorer2 and CIRCexplorer3 are powerful tools for circular RNA analysis, each offers distinct advantages and challenges that are worth understanding before you begin your analysis journey.

CIRCexplorer2’s Flexible Approach

CIRCexplorer2 provides two paths for analysis: a streamlined one-command process and our recommended three-step protocol. While the one-command option might seem appealing at first glance, it comes with several important considerations. This approach requires additional computational resources and setup time, as it needs both HISAT2 and Bowtie aligners along with their corresponding genome indices. More significantly, the processing time for this method can be substantially longer than the three-step approach in this tutorial.

Our recommended three-step protocol offers a more efficient and manageable workflow. By breaking the analysis into distinct stages – quality control, alignment, and circular RNA detection – you gain better control over each step and can more easily troubleshoot if issues arise. This approach also builds upon the STAR aligner that many researchers already use for standard RNA-seq analysis, making it a natural extension of existing workflows.

CIRCexplorer3’s Specialized Features

CIRCexplorer3, through its CLEAR pipeline, introduces an innovative approach by enabling direct comparisons between circular and linear RNA expression. This capability makes it particularly valuable for studies focusing on the relationship between circular RNAs and their linear counterparts. However, it’s important to note that setting up CIRCexplorer3 can be challenging due to its dependence on legacy bioinformatics tools.

When should you choose CIRCexplorer3? If your research specifically requires comprehensive analysis of the interplay between circular and linear RNA expression, the additional setup complexity may be worthwhile. The tool excels in situations where you need to:

  • Compare expression patterns between circular and linear RNA forms
  • Investigate the regulation of back-splicing versus canonical splicing
  • Study the competition between these two RNA processing paths

For most standard circRNA studies, CIRCexplorer2 provides all the necessary functionality with a more straightforward setup process. Its robust detection algorithms and well-documented workflow make it an excellent choice for researchers beginning their journey into circular RNA analysis.

Conclusion

The analysis of circular RNAs represents a perfect example of how biological discovery drives technological innovation, which in turn enables deeper biological insights. The methods and tools we’ve covered in this tutorial provide a solid foundation for your own explorations into this fascinating aspect of RNA biology. Whether you’re studying disease mechanisms, developing biomarkers, or exploring fundamental biology, understanding circRNA analysis is increasingly essential for modern molecular biology research.

Remember that while the technical aspects of circRNA analysis are important, the ultimate goal is to contribute to our understanding of biological systems and human health. As you apply these methods to your own research questions, stay curious and be ready to adapt as new tools and insights emerge in this rapidly evolving field.

References

Misir, S., Wu, N. & Yang, B.B. (2022). Specific expression and functions of circular RNAs. Cell Death Differ 29, 481–491.
Alyaa Dawoud, Zeina Ihab Zakaria, Hannah Hisham Rashwan, Maria Braoudaki, Rana A. Youness,
Circular RNAs: New layer of complexity evading breast cancer heterogeneity, Non-coding RNA Research, Volume 8, Issue 1, 2023, Pages 60-74, ISSN 2468-0540,
https://doi.org/10.1016/j.ncrna.2022.09.011.
Amit Kumar Rai, Brooke Lee, Carleigh Hebbard, Shizuka Uchida, Venkata Naga Srikanth Garikipati, Decoding the complexity of circular RNAs in cardiovascular disease, Pharmacological Research, Volume 171, 2021, 105766, ISSN 1043-6618, https://doi.org/10.1016/j.phrs.2021.105766.
Zhang XO, et al. (2016). Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res, 26:1277-1287.
Ma XK, et al. (2019). A CLEAR pipeline for direct comparison of circular and linear RNA expression. bioRxiv doi: 10.1101/668657

Comments

4 responses to “How to Analyze RNAseq Data for Absolute Beginners Part 13: Circular RNAseq Analysis”

  1. Reshma RD Avatar
    Reshma RD

    how to extract circular RNA from multiple SRA files and do differential gene expression

    1. Lei Avatar

      Hi Reshma,

      The process for downloading public datasets and performing differential expression (DE) analysis for circRNA-seq data is essentially the same as the workflow I demonstrated in my RNA-seq tutorials.

  2. Reshma RD Avatar
    Reshma RD

    Hi Lei,
    Thanks for your reply
    I am beginner in bioinformatics . so i am having couple of doubts . I tried ciri3 for circular RNA extraction. and using a script to prefetch nearly 100 to 200 sra files. when i did circular RNA extraction , its giving only head lines as output. If i am doing it in single sample , the order of circular RNA and its score are in different order. there for differential gene expression analysis will be difficult cause even from single sample we can see more than 2000 circular RNA . In that case , i’m not sure what to do next kindly help me to understand it better. whats your opinion

    1. Lei Avatar

      Hi Reshma,

      Thanks for reaching out! I want to make sure I understand your situation correctly so I can help you effectively. Let me break down what I’m seeing:

      First, a key clarification: You mentioned using CIRI3, but this tutorial focuses on CIRCexplorer2 – these are completely different tools with different workflows and output formats. This might be causing some of the confusion you’re experiencing.

      Now, to help me understand your specific issues, could you clarify a few things?

      1. About “extracting circular RNA from multiple SRA files”:
      When you say “extract circular RNA from multiple SRA files,” do you mean:
      – You’re having trouble downloading FASTQ files from SRA?
      – You don’t know how to batch process multiple samples (running the same analysis on 100-200 samples)?
      – You’re unclear about the complete workflow from raw SRA data → FASTQ → alignment → circRNA detection?

      Just to clarify: circRNAs aren’t literally “in” the SRA files waiting to be extracted – we need to download the sequencing reads, align them to the genome, and then use specialized tools (like CIRI3 or CIRCexplorer2) to identify back-splice junctions that indicate circRNAs. Is this workflow clear to you?

      2. About “getting only headlines as output”:
      Are you saying that CIRI3 is producing output files that only contain the column headers (like chrom, start, end, etc.) but no actual circRNA detection results below? If so, this suggests CIRI3 isn’t detecting any circRNAs, which could be due to:
      – Alignment parameter issues
      – Reference file problems
      – Data quality issues
      – Or incorrect CIRI3 configuration

      3. About “different order” across samples:
      Are you finding that:
      – Sample A detects circRNA_1, circRNA_5, circRNA_7…
      – Sample B detects circRNA_2, circRNA_5, circRNA_9…
      – And you’re not sure how to combine these into a single count matrix for differential expression?

      If this is the issue, this is actually a normal challenge (not covered in my current tutorial) – each sample will detect different circRNAs, and you need to create a unified matrix where missing circRNAs get zero counts.

      4. About detecting 2000+ circRNAs per sample:
      This is actually quite normal for circRNA analysis! Are you concerned this is too many, or are you asking how to handle this large number in downstream analysis?

      Moving forward:
      Since you’re using CIRI3 and my tutorial uses CIRCexplorer2, would you be interested in either:
      – Switching to CIRCexplorer2 to follow my existing tutorial, or
      – Preferring me to create a new CIRI3-specific tutorial?

      Please let me know which issue is causing you the most trouble, and we can work through it step by step!

Leave a Reply

Your email address will not be published. Required fields are marked *