Glossary

RNAseq

  • A-to-I Editing: The conversion of adenosine (A) to inosine (I) in RNA. Inosine is interpreted as guanosine (G) by ribosomes and reverse transcription machinery. This is mediated by the ADAR (Adenosine Deaminase Acting on RNA) family of enzymes.
  • ADAR Enzymes: Enzymes that catalyze A-to-I editing in double-stranded RNA (dsRNA) regions, often in non-coding regions like introns and untranslated regions (UTRs), but also in coding regions to produce protein variants.
  • Alu Elements: Short, repetitive sequences found abundantly in primate genomes. These elements often form dsRNA structures, making them hotspots for A-to-I editing.
  • Alignment: The process of mapping RNAseq reads to a reference genome or transcriptome to determine the origin and structure of transcripts.
  • Alternative Splicing: A process by which a single gene can produce multiple RNA isoforms through the inclusion or exclusion of specific exons, leading to the generation of multiple proteins.
  • Annotation: Information about the genomic features (e.g., gene locations, exons, introns) that helps interpret the RNA-seq data.
  • APOBEC Enzymes: Enzymes responsible for C-to-U editing, primarily seen in the editing of mRNA for specific proteins such as apolipoprotein B.
  • Back-Splicing: A unique process in circular RNA biogenesis where a downstream splice donor is joined to an upstream splice acceptor, forming a circular structure.
  • Base Quality Score: A measure of the accuracy of each nucleotide call in a sequencing read.
  • Batch Effect: Unwanted variation in data due to technical rather than biological factors, often arising from differences between experimental batches.
  • Counts: The number of reads aligned to a specific feature (e.g., gene or transcript), used as a measure of expression level.
  • Coverage: The number of reads that overlap a particular region of the genome or transcriptome, indicating how well that region is represented in the sequencing data.
  • CPM (Counts Per Million): A normalization method used to account for sequencing depth differences between samples.
  • C-to-U Editing: The conversion of cytosine (C) to uracil (U) in RNA. This is mediated by APOBEC (Apolipoprotein B mRNA Editing Catalytic Polypeptide) enzymes and is less prevalent than A-to-I editing.
  • DESeq2: A widely used R package for analyzing RNA-seq data to detect differential gene expression between different experimental conditions.
  • Differential Expression (DE): The process of identifying genes or transcripts whose expression levels significantly differ between conditions (e.g., treated vs. untreated samples).
  • Differential Splicing Analysis: A method to compare splicing patterns between different conditions or groups (e.g., treated vs. untreated samples) to identify changes in alternative splicing.
  • Downstream Analysis: Analysis steps that follow the initial processing of RNA-seq data, such as differential expression, pathway analysis, and functional enrichment.
  • EdgeR: A software package used for differential expression analysis of RNA-seq count data.
  • Editing Frequency (Editing Ratio): The proportion of RNA molecules at a specific site that are edited.
  • Editing Sites: Specific nucleotide positions in an RNA molecule where editing occurs. These sites can be identified through comparison of RNA-seq data to the corresponding DNA sequence.
  • Exon: A segment of a gene that codes for a portion of the final RNA transcript. Exons are retained in the mature mRNA after splicing.
  • Exon Skipping: An exon is skipped or included in the mRNA.
  • Expression Level: The abundance of a transcript in the sample, typically measured in counts or TPM (Transcripts Per Million).
  • False Discovery Rate (FDR): A statistical method used to correct for multiple hypothesis testing, providing a measure of the expected proportion of false positives among significant results.
  • Feature: Any element in the genome that is analyzed in RNA-seq, such as a gene, transcript, exon, or intron.
  • FPKM (Fragments Per Kilobase of transcript per Million mapped reads): A normalization method used in RNA-Seq analysis to measure gene expression levels, taking into account both the sequencing depth and the length of the transcript.
  • Gene Ontology (GO): A framework used for annotating genes and gene products based on their molecular function, biological process, and cellular component.
  • Gene Set Enrichment Analysis (GSEA): A method for determining whether a set of genes shows statistically significant differences in expression between two biological states.
  • GTF/GFF File: A file format that describes gene and transcript annotations, providing information about exon-intron structures.
  • Heatmap: A graphical representation of expression data, often used to visualize the expression levels of many genes across multiple samples.
  • Homopolymer: A sequence of identical nucleotides repeated consecutively in a stretch of RNA.
  • Hyper-Editing: A phenomenon where multiple editing events occur in close proximity, typically in regions of dsRNA. Hyper-editing is often observed in repetitive sequences such as Alu elements in primates.
  • Intron: A non-coding segment of a gene that is removed during RNA splicing and is not present in the mature mRNA.
  • Intron Retention: An intron is retained in the mRNA instead of being spliced out.
  • Isoform: Different mRNA molecules produced from the same gene through alternative splicing, leading to variations in the protein produced.
  • Junction Reads: Sequencing reads that span across splice junctions, providing evidence of splicing events.
  • Library Preparation: The process of converting RNA into a form that is compatible with sequencing, often involving reverse transcription into cDNA and adapter ligation.
  • limma: A popular R package used for the analysis of gene expression data, including both microarray and RNA-seq experiments.
  • Log Fold Change (logFC): A measure of how much gene expression changes between conditions, often expressed on a log scale.
  • Mapped Reads: Reads that have been successfully aligned to the reference genome.
  • Mapping: The process of aligning RNA-seq reads to a reference genome or transcriptome.
  • Multimapping Reads: Reads that align to more than one location in the genome, making it difficult to determine their origin.
  • Mutually Exclusive Exons: Only one of two exons is included in the mRNA.
  • Normalization: The process of adjusting RNA-seq data to account for differences in sequencing depth or other technical biases, making comparisons between samples meaningful.
  • Normalization Factor: A scaling factor applied to RNA-seq data to correct for differences in sequencing depth or RNA composition between samples.
  • P-Value: A statistical measure that indicates the probability that an observed difference could have occurred by chance.
  • PCA (Principal Component Analysis): A dimensionality reduction technique used to visualize the variability in high-dimensional RNA-seq data, often used for sample clustering.
  • Poly-A Tail: A stretch of adenine nucleotides added to the 3′ end of eukaryotic mRNA, often used to enrich for mature mRNAs during RNA-seq library preparation.
  • Principal Component Analysis (PCA): A dimensionality reduction technique used to visualize the variability in high-dimensional RNA-seq data, often used for sample clustering.
  • PSI (Percent Spliced In): A metric used to quantify exon inclusion levels, typically calculated as the percentage of transcripts including a specific exon.
  • Quantification: The process of determining the abundance of transcripts in an RNA-seq experiment, usually expressed as counts, FPKM (Fragments Per Kilobase of transcript per Million mapped reads), or TPM (Transcripts Per Million).
  • Quality Control (QC): The process of assessing the quality of RNA-seq data to detect potential problems (e.g., low sequencing depth, contamination, or batch effects).
  • Read: A sequence of nucleotides produced by the sequencing process, representing a fragment of RNA.
  • Read Count: The number of sequencing reads mapping to a gene, exon, or splice junction, used as a measure of gene expression or splicing.
  • RNA Editing: A post-transcriptional process where specific nucleotides in RNA are altered, leading to changes in the RNA sequence that differ from the DNA template. The most common types are A-to-I (Adenosine-to-Inosine) and C-to-U (Cytosine-to-Uracil) editing.
  • RNA Isoforms: Different versions of RNA transcripts produced from the same gene through mechanisms like alternative splicing, alternative promoter usage, or alternative polyadenylation. Each isoform can have unique biological functions.
  • RNA-Seq: RNA sequencing, a technique used to study gene expression by sequencing RNA molecules.
  • RPKM (Reads Per Kilobase of transcript per Million mapped reads): A normalization metric for RNA-seq data that accounts for gene length and sequencing depth.
  • Single-End Sequencing: RNA-seq where only one end of a fragment is sequenced, as opposed to paired-end sequencing, where both ends are sequenced.
  • Splice Junction: The boundary between an exon and an intron, where the splicing machinery cuts and joins RNA to remove introns and connect exons.
  • Splice Variants: Different versions of mRNA produced by a single gene through alternative splicing.
  • Spliceosome: A complex of proteins and small nuclear RNAs (snRNAs) responsible for removing introns and joining exons during RNA splicing.
  • Splicing: The process of removing introns and joining exons to produce a mature mRNA transcript.
  • Splicing Events: Specific types of alternative splicing patterns, such as:
    • Exon Skipping: An exon is skipped or included in the mRNA.
    • Intron Retention: An intron is retained in the mRNA instead of being spliced out.
    • Alternative 5’ Splice Site: Variation in the site where splicing occurs at the 5’ end of an exon.
    • Alternative 3’ Splice Site: Variation in the site where splicing occurs at the 3’ end of an exon.
    • Mutually Exclusive Exons: Only one of two exons is included in the mRNA.
  • STAR: A popular RNA-seq read aligner known for its speed and accuracy.
  • Subread Software Suite: A tool used for reads alignment and gene counting.
  • TPM (Transcripts Per Million): A normalization metric for RNA-seq data, similar to RPKM, but allows better comparison between samples.
  • Transcript: The RNA product of a gene, which may be spliced and processed into different isoforms.
  • Transcriptome: The complete set of RNA transcripts produced by the genome, under specific conditions or in specific cells.
  • Trimgalore: A popular wrapper tool that combines the functionalities of Cutadapt and FastQC to automate quality control and adapter trimming of high-throughput sequencing data.
  • Trimmomatic: One of the most popular and flexible tools for adapter trimming and quality control.
  • UMI (Unique Molecular Identifier): A barcode added to individual RNA molecules before sequencing to help differentiate between technical duplicates and biological duplicates.
  • Volcano Plot: A scatter plot that displays the significance (p-value) and magnitude of change (fold change) of genes between two conditions, often used in differential expression analysis.
  • WGCNA (Weighted Gene Co-Expression Network Analysis): A method used to identify clusters (modules) of co-expressed genes and correlate them with external traits.