How to Analyze RNAseq Data for Absolute Beginners Part 15-2: Mastering UMI-Based miRNA-Seq Analysis

How to Analyze RNAseq Data for Absolute Beginners Part 15-2: Mastering UMI-Based miRNA-Seq Analysis

Understanding UMI-Based miRNA Sequencing

MicroRNAs (miRNAs) serve as crucial regulators in gene expression, making their accurate quantification essential for understanding disease mechanisms and biological processes. While traditional miRNA sequencing has proven valuable, the integration of Unique Molecular Identifiers (UMIs) represents a significant advancement in achieving precise miRNA measurements. This tutorial will guide you through the complete workflow of UMI-based miRNA-seq analysis, suitable for both beginners and experienced researchers.

Why UMIs Matter in miRNA-Seq Analysis

Think of UMIs as unique barcodes attached to individual RNA molecules during library preparation. These molecular tags serve a critical purpose: they allow us to distinguish between genuine biological signals and technical artifacts introduced during PCR amplification. Just as a barcode helps track individual items in a store, UMIs help us track individual RNA molecules through the sequencing process.

The key advantages of using UMIs include:

  • Elimination of PCR duplicates for more accurate quantification
  • Improved detection of low-abundance miRNAs
  • Enhanced reliability in differential expression analysis
  • Better reproducibility across experiments

Choosing Between Regular and UMI-Based miRNA-Seq

Your choice between these approaches depends on your research goals:

Regular miRNA-seq works well for:

  • Initial exploratory studies
  • Projects with limited budgets
  • Situations requiring broad miRNA detection
  • Basic differential expression analysis

UMI-based miRNA-seq excels in:

  • Biomarker discovery projects
  • Clinical sample analysis
  • Studies with degraded RNA (e.g., FFPE samples)
  • Research requiring absolute quantification
  • Low-input RNA experiments

Setting Up Your Analysis Environment

Required Software Installation

We’ll build upon the RNA-seq environment from our previous miRNA-seq tutorial.

Let’s begin by setting up our analysis environment with the necessary tools (UMI-tools):

# First, activate our RNA-seq environment
conda activate rnaseq_env

# Install UMI-tools and its dependencies
conda install -c bioconda -c conda-forge umi_tools

Reference File Preparation

For this tutorial, we’ll use the same reference files as our previous miRNA-seq analysis. If you haven’t prepared these files yet, please refer to our basic miRNA-seq tutorial for detailed instructions.

Analysis Approaches: Two Complementary Methods

We’ll explore two robust approaches for analyzing UMI-based miRNA-seq data, each offering unique advantages. Our example uses data prepared with the QIAseq miRNA Library Kit.

Method 1: Streamlined Analysis with UMI-tools

This approach offers a quick and efficient way to process your data while maintaining accuracy.

Step 1: Quality Control and UMI Extraction

First, let’s create our directory structure and process the raw data:

# Create directory structure
mkdir -p ~/miRNAseq_UMI/{raw,trimmed,aligned}/Sample1_R1_001

# Extract UMIs and remove QIAseq adapters
umi_tools extract \
    --extract-method=regex \
    --bc-pattern='.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.+)' \
    --stdin=~/miRNAseq_UMI/raw/Sample1_R1_001/Sample1_R1_001.fastq.gz \
    --stdout=~/miRNAseq_UMI/trimmed/Sample1_R1_001/Sample1_R1_001-directUMIextracted.fastq \
    --log=~/miRNAseq_UMI/trimmed/Sample1_R1_001/Sample1_R1_001_umi-UMIextraction-fromrawreads.log

# Filter reads by length (18-30 nucleotides)
cutadapt \
    --minimum-length=18 \
    --maximum-length=30 \
    -o ~/miRNAseq_UMI/trimmed/Sample1_R1_001/Sample1_R1_001-directUMIextracted-min18max30L.fastq \
    ~/miRNAseq_UMI/trimmed/Sample1_R1_001/Sample1_R1_001-directUMIextracted.fastq

Step 2: Reference Alignment

Align the processed reads to your reference miRNA sequences:

# Align to human miRNA reference using Bowtie
bowtie \
    -n 0 -l 32 --norc --best --strata -m 1 --threads 16 \
    -x ~/Genome_Index/mirbase_bowtie_index/mirbase_hsa \
    ~/miRNAseq_UMI/trimmed/Sample1_R1_001/Sample1_R1_001-directUMIextracted-min18max30L.fastq \
    -S ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-maturemiRNA-aligned-bowtie1-beststratam1.sam

Step 3: UMI-Based Quantification

Process aligned reads and perform UMI-based counting:

# Convert and sort alignment file
samtools sort ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-maturemiRNA-aligned-bowtie1-beststratam1.sam > \
    ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-maturemiRNA-aligned-bowtie1-beststratam1.bam

# Index the BAM file
samtools index ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-maturemiRNA-aligned-bowtie1-beststratam1.bam

# Remove PCR duplicates using UMIs
umi_tools dedup \
    --method=unique \
    -I ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-maturemiRNA-aligned-bowtie1-beststratam1.bam \
    -L ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-deduplicate-matureMirna-uniquemethod-beststratam1.log \
    -S ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-deduplicated-matureMirna-uniquemethod-beststratam1.bam

# Generate final counts
umi_tools count \
    --method=unique \
    --per-contig \
    -I ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-deduplicated-matureMirna-uniquemethod-beststratam1.bam \
    -L ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001-counts-uniquemethod-maturemiRNA.log \
    -S ~/miRNAseq_UMI/aligned/Sample1_R1_001/Sample1_R1_001_counts-finaloutput-uniquemethod-maturemiRNA.txt

The final count table has the following format:

Method 2: Comprehensive Analysis with miRDeep2

This approach provides deeper insights into miRNA expression and potential novel miRNAs.

Step 1: Genome Alignment

# Create output directory
mkdir -p ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/
cd ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/

# Align to human genome
# The processed FASTQ file is from Method 1
bowtie \
    -n 0 -l 32 --norc --best --strata -m 1 --threads 16 \
    -x ~/Genome_Index/bowtie_index_hg38 \
    ~/miRNAseq_UMI/trimmed/Sample1_R1_001/Sample1_R1_001-directUMIextracted-min18max30L.fastq \
    -S ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1.sam

Step 2: miRDeep2 Input Preparation

# Process alignment files
samtools sort ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1.sam > \
    ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1.bam

samtools index ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1.bam

# Remove duplicates
umi_tools dedup \
    --method=unique \
    -I ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1.bam \
    -L ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.log \
    -S ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.bam

# Convert to miRDeep2 format
samtools view -h -o ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.sam \
    ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.bam

bwa_sam_converter.pl \
    -i Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.sam \
    -c \
    -o Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.collapsed \
    -a Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.arf

Step 3: miRDeep2 Analysis

miRDeep2.pl \
    ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.collapsed \
    ~/Genome_Index/Genome/hg38/genome_hg38_renamed.fa \
    ~/miRNAseq_UMI/mirdeep2/output/Sample1_R1_001/Sample1_R1_001-hg38-aligned-bowtie1-beststratam1_dedup.arf \
    ~/Genome_Index/mirbase_bowtie_index/mature_hsa_renamed.fa \
    none \
    ~/Genome_Index/mirbase_bowtie_index/hairpin_hsa_renamed.fa \
    -t hsa

After running miRDeep2, you’ll find the output files organized in the same directory structure covered in our previous tutorial on regular miRNA-seq analysis. This includes the result.html file containing detailed miRNA predictions, expression data, and secondary structure information, along with tab-separated files of expression counts and statistics. For a detailed breakdown of these output files and their interpretation, please refer to our previous miRNA-seq tutorial. The only difference is that these results now reflect UMI-based quantification, providing more accurate measurements of miRNA expression levels.

Troubleshooting Guide

When working with miRDeep2, you’ll need to pay special attention to your Bowtie installation. miRDeep2 comes bundled with an older version of Bowtie, but our analysis requires features only available in the latest version. The older version lacks support for some of the command-line parameters we’re using, particularly the threading options and alignment strategy parameters.

To ensure compatibility, you should explicitly use the latest version of Bowtie from your system installation rather than the version packaged with miRDeep2. You can verify your Bowtie version by running bowtie --version – you’ll want version 1.3.0 or higher. If you need to install or update Bowtie, you can do so using conda:

conda install -c bioconda bowtie=1.3.1

Conclusion

UMI-based miRNA-seq analysis provides superior accuracy in miRNA quantification when properly executed. By following this guide and implementing the best practices outlined above, you’ll be well-equipped to generate high-quality miRNA expression data for your research.

For more information on downstream analysis, including differential expression analysis, please refer to our RNA-seq differential expression tutorial.

References

  • Potla P, Ali SA, Kapoor M. A bioinformatics approach to microRNA-sequencing analysis. Osteoarthr Cartil Open. 2020 Dec 19;3(1):100131. doi: 10.1016/j.ocarto.2020.100131. PMID: 36475076; PMCID: PMC9718162.
  • Cowan, E., Karagiannopoulos, A., Eliasson, L. (2023). MicroRNAs in Type 2 Diabetes: Focus on MicroRNA Profiling in Islets of Langerhans. In: Moore, A., Wang, P. (eds) Type-1 Diabetes. Methods in Molecular Biology, vol 2592. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2807-2_8

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *