Glossary

RNAseq

  • Alignment: The process of matching RNA-seq reads to a reference genome or transcriptome to determine their origin.
  • Annotation: Information about the genomic features (e.g., gene locations, exons, introns) that helps interpret the RNA-seq data.
  • Batch Effect: Unwanted variation in data due to technical rather than biological factors, often arising from differences between experimental batches.
  • Base Quality Score: A measure of the accuracy of each nucleotide call in a sequencing read.
  • Counts: The number of reads aligned to a specific feature (e.g., gene or transcript), used as a measure of expression level.
  • Coverage: The number of reads that overlap a particular region of the genome or transcriptome, indicating how well that region is represented in the sequencing data.
  • CPM (Counts Per Million): A normalization method used to account for sequencing depth differences between samples.
  • DESeq2: A widely used R package for analyzing RNA-seq data to detect differential gene expression between different experimental conditions.
  • Differential Expression (DE): The process of identifying genes or transcripts whose expression levels significantly differ between conditions (e.g., treated vs. untreated samples).
  • Downstream Analysis: Analysis steps that follow the initial processing of RNA-seq data, such as differential expression, pathway analysis, and functional enrichment.
  • EdgeR: A software package used for differential expression analysis of RNA-seq count data.
  • Exon: The coding regions of a gene that remain in the mature RNA transcript after splicing.
  • Expression Level: The abundance of a transcript in the sample, typically measured in counts or TPM (Transcripts Per Million).
  • False Discovery Rate (FDR): A statistical method used to correct for multiple hypothesis testing, providing a measure of the expected proportion of false positives among significant results.
  • Feature: Any element in the genome that is analyzed in RNA-seq, such as a gene, transcript, exon, or intron.
  • FPKM (Fragments Per Kilobase of transcript per Million mapped reads): A normalization method used in RNA-Seq analysis to measure gene expression levels, taking into account both the sequencing depth and the length of the transcript.
  • Gene Ontology (GO): A framework used for annotating genes and gene products based on their molecular function, biological process, and cellular component.
  • Gene Set Enrichment Analysis (GSEA): A method for determining whether a set of genes shows statistically significant differences in expression between two biological states.
  • Heatmap: A graphical representation of expression data, often used to visualize the expression levels of many genes across multiple samples.
  • Homopolymer: A sequence of identical nucleotides repeated consecutively in a stretch of RNA.
  • Intron: Non-coding regions of a gene that are spliced out during RNA processing, and do not remain in the mature transcript.
  • Isoform: Different versions of a transcript produced from the same gene due to alternative splicing.
  • Library Preparation: The process of converting RNA into a form that is compatible with sequencing, often involving reverse transcription into cDNA and adapter ligation.
  • Log Fold Change (logFC): A measure of how much gene expression changes between conditions, often expressed on a log scale.
  • limma: A popular R package used for the analysis of gene expression data, including both microarray and RNA-seq experiments.
  • Mapping: The process of aligning RNA-seq reads to a reference genome or transcriptome.
  • Mapped Reads: Reads that have been successfully aligned to the reference genome.
  • Multimapping Reads: Reads that align to more than one location in the genome, making it difficult to determine their origin.
  • Normalization: The process of adjusting RNA-seq data to account for differences in sequencing depth or other technical biases, making comparisons between samples meaningful.
  • Normalization Factor: A scaling factor applied to RNA-seq data to correct for differences in sequencing depth or RNA composition between samples.
  • P-Value: A statistical measure that indicates the probability that an observed difference could have occurred by chance.
  • Principal Component Analysis (PCA): A dimensionality reduction technique used to visualize the variability in high-dimensional RNA-seq data, often used for sample clustering.
  • Poly-A Tail: A stretch of adenine nucleotides added to the 3′ end of eukaryotic mRNA, often used to enrich for mature mRNAs during RNA-seq library preparation.
  • Quantification: The process of determining the abundance of transcripts in an RNA-seq experiment, usually expressed as counts, FPKM (Fragments Per Kilobase of transcript per Million mapped reads), or TPM (Transcripts Per Million).
  • Quality Control (QC): The process of assessing the quality of RNA-seq data to detect potential problems (e.g., low sequencing depth, contamination, or batch effects).
  • Read: A sequence of nucleotides produced by the sequencing process, representing a fragment of RNA.
  • RPKM (Reads Per Kilobase of transcript per Million mapped reads): A normalization metric for RNA-seq data that accounts for gene length and sequencing depth.
  • RNA-Seq: RNA sequencing, a technique used to study gene expression by sequencing RNA molecules.
  • Splicing: The process of removing introns and joining exons to produce a mature mRNA transcript.
  • Single-End Sequencing: RNA-seq where only one end of a fragment is sequenced, as opposed to paired-end sequencing, where both ends are sequenced.
  • STAR: A popular RNA-seq read aligner known for its speed and accuracy.
  • Subread Software Suite: A tool used for reads alignment and gene counting.
  • Transcript: The RNA product of a gene, which may be spliced and processed into different isoforms.
  • TPM (Transcripts Per Million): A normalization metric for RNA-seq data, similar to RPKM, but allows better comparison between samples.
  • Transcriptome: The complete set of RNA transcripts produced by the genome, under specific conditions or in specific cells.
  • Trimgalore: A popular wrapper tool that combines the functionalities of Cutadapt and FastQC to automate quality control and adapter trimming of high-throughput sequencing data.
  • Trimmomatic: One of the most popular and flexible tools for adapter trimming and quality control.
  • UMI (Unique Molecular Identifier): A barcode added to individual RNA molecules before sequencing to help differentiate between technical duplicates and biological duplicates.
  • Volcano Plot: A scatter plot that displays the significance (p-value) and magnitude of change (fold change) of genes between two conditions, often used in differential expression analysis.
  • WGCNA (Weighted Gene Co-Expression Network Analysis): A method used to identify clusters (modules) of co-expressed genes and correlate them with external traits.
  • Alternative Splicing: A process by which a single gene can produce multiple RNA isoforms through the inclusion or exclusion of specific exons, leading to the generation of multiple proteins.
  • Exon: A segment of a gene that codes for a portion of the final RNA transcript. Exons are retained in the mature mRNA after splicing.
  • Intron: A non-coding segment of a gene that is removed during RNA splicing and is not present in the mature mRNA.
  • Splice Junction: The boundary between an exon and an intron, where the splicing machinery cuts and joins RNA to remove introns and connect exons.
  • Isoform: Different mRNA molecules produced from the same gene through alternative splicing, leading to variations in the protein produced.
  • Differential Splicing Analysis: A method to compare splicing patterns between different conditions or groups (e.g., treated vs. untreated samples) to identify changes in alternative splicing.
  • Splicing Events: Specific types of alternative splicing patterns, such as:
    • Exon Skipping: An exon is skipped or included in the mRNA.
    • Intron Retention: An intron is retained in the mRNA instead of being spliced out.
    • Alternative 5’ Splice Site: Variation in the site where splicing occurs at the 5’ end of an exon.
    • Alternative 3’ Splice Site: Variation in the site where splicing occurs at the 3’ end of an exon.
    • Mutually Exclusive Exons: Only one of two exons is included in the mRNA.
  • Spliceosome: A complex of proteins and small nuclear RNAs (snRNAs) responsible for removing introns and joining exons during RNA splicing.
  • PSI (Percent Spliced In): A metric used to quantify exon inclusion levels, typically calculated as the percentage of transcripts including a specific exon.
  • Read Count: The number of sequencing reads mapping to a gene, exon, or splice junction, used as a measure of gene expression or splicing.
  • Splicing Factor: A protein involved in the regulation of splicing events, controlling the inclusion or exclusion of exons.
  • GTF/GFF File: A file format that describes gene and transcript annotations, providing information about exon-intron structures.
  • Alignment: The process of mapping RNAseq reads to a reference genome or transcriptome to determine the origin and structure of transcripts.
  • Splice Variants: Different versions of mRNA produced by a single gene through alternative splicing.
  • Junction Reads: Sequencing reads that span across splice junctions, providing evidence of splicing events.