RNAseq
- Alignment: The process of matching RNA-seq reads to a reference genome or transcriptome to determine their origin.
- Annotation: Information about the genomic features (e.g., gene locations, exons, introns) that helps interpret the RNA-seq data.
- Batch Effect: Unwanted variation in data due to technical rather than biological factors, often arising from differences between experimental batches.
- Base Quality Score: A measure of the accuracy of each nucleotide call in a sequencing read.
- Counts: The number of reads aligned to a specific feature (e.g., gene or transcript), used as a measure of expression level.
- Coverage: The number of reads that overlap a particular region of the genome or transcriptome, indicating how well that region is represented in the sequencing data.
- CPM (Counts Per Million): A normalization method used to account for sequencing depth differences between samples.
- DESeq2: A widely used R package for analyzing RNA-seq data to detect differential gene expression between different experimental conditions.
- Differential Expression (DE): The process of identifying genes or transcripts whose expression levels significantly differ between conditions (e.g., treated vs. untreated samples).
- Downstream Analysis: Analysis steps that follow the initial processing of RNA-seq data, such as differential expression, pathway analysis, and functional enrichment.
- EdgeR: A software package used for differential expression analysis of RNA-seq count data.
- Exon: The coding regions of a gene that remain in the mature RNA transcript after splicing.
- Expression Level: The abundance of a transcript in the sample, typically measured in counts or TPM (Transcripts Per Million).
- False Discovery Rate (FDR): A statistical method used to correct for multiple hypothesis testing, providing a measure of the expected proportion of false positives among significant results.
- Feature: Any element in the genome that is analyzed in RNA-seq, such as a gene, transcript, exon, or intron.
- FPKM (Fragments Per Kilobase of transcript per Million mapped reads): A normalization method used in RNA-Seq analysis to measure gene expression levels, taking into account both the sequencing depth and the length of the transcript.
- Gene Ontology (GO): A framework used for annotating genes and gene products based on their molecular function, biological process, and cellular component.
- Gene Set Enrichment Analysis (GSEA): A method for determining whether a set of genes shows statistically significant differences in expression between two biological states.
- Heatmap: A graphical representation of expression data, often used to visualize the expression levels of many genes across multiple samples.
- Homopolymer: A sequence of identical nucleotides repeated consecutively in a stretch of RNA.
- Intron: Non-coding regions of a gene that are spliced out during RNA processing, and do not remain in the mature transcript.
- Isoform: Different versions of a transcript produced from the same gene due to alternative splicing.
- Library Preparation: The process of converting RNA into a form that is compatible with sequencing, often involving reverse transcription into cDNA and adapter ligation.
- Log Fold Change (logFC): A measure of how much gene expression changes between conditions, often expressed on a log scale.
- limma: A popular R package used for the analysis of gene expression data, including both microarray and RNA-seq experiments.
- Mapping: The process of aligning RNA-seq reads to a reference genome or transcriptome.
- Mapped Reads: Reads that have been successfully aligned to the reference genome.
- Multimapping Reads: Reads that align to more than one location in the genome, making it difficult to determine their origin.
- Normalization: The process of adjusting RNA-seq data to account for differences in sequencing depth or other technical biases, making comparisons between samples meaningful.
- Normalization Factor: A scaling factor applied to RNA-seq data to correct for differences in sequencing depth or RNA composition between samples.
- P-Value: A statistical measure that indicates the probability that an observed difference could have occurred by chance.
- Principal Component Analysis (PCA): A dimensionality reduction technique used to visualize the variability in high-dimensional RNA-seq data, often used for sample clustering.
- Poly-A Tail: A stretch of adenine nucleotides added to the 3′ end of eukaryotic mRNA, often used to enrich for mature mRNAs during RNA-seq library preparation.
- Quantification: The process of determining the abundance of transcripts in an RNA-seq experiment, usually expressed as counts, FPKM (Fragments Per Kilobase of transcript per Million mapped reads), or TPM (Transcripts Per Million).
- Quality Control (QC): The process of assessing the quality of RNA-seq data to detect potential problems (e.g., low sequencing depth, contamination, or batch effects).
- Read: A sequence of nucleotides produced by the sequencing process, representing a fragment of RNA.
- RPKM (Reads Per Kilobase of transcript per Million mapped reads): A normalization metric for RNA-seq data that accounts for gene length and sequencing depth.
- RNA-Seq: RNA sequencing, a technique used to study gene expression by sequencing RNA molecules.
- Splicing: The process of removing introns and joining exons to produce a mature mRNA transcript.
- Single-End Sequencing: RNA-seq where only one end of a fragment is sequenced, as opposed to paired-end sequencing, where both ends are sequenced.
- STAR: A popular RNA-seq read aligner known for its speed and accuracy.
- Subread Software Suite: A tool used for reads alignment and gene counting.
- Transcript: The RNA product of a gene, which may be spliced and processed into different isoforms.
- TPM (Transcripts Per Million): A normalization metric for RNA-seq data, similar to RPKM, but allows better comparison between samples.
- Transcriptome: The complete set of RNA transcripts produced by the genome, under specific conditions or in specific cells.
- Trimgalore: A popular wrapper tool that combines the functionalities of Cutadapt and FastQC to automate quality control and adapter trimming of high-throughput sequencing data.
- Trimmomatic: One of the most popular and flexible tools for adapter trimming and quality control.
- UMI (Unique Molecular Identifier): A barcode added to individual RNA molecules before sequencing to help differentiate between technical duplicates and biological duplicates.
- Volcano Plot: A scatter plot that displays the significance (p-value) and magnitude of change (fold change) of genes between two conditions, often used in differential expression analysis.
- WGCNA (Weighted Gene Co-Expression Network Analysis): A method used to identify clusters (modules) of co-expressed genes and correlate them with external traits.
- Alternative Splicing: A process by which a single gene can produce multiple RNA isoforms through the inclusion or exclusion of specific exons, leading to the generation of multiple proteins.
- Exon: A segment of a gene that codes for a portion of the final RNA transcript. Exons are retained in the mature mRNA after splicing.
- Intron: A non-coding segment of a gene that is removed during RNA splicing and is not present in the mature mRNA.
- Splice Junction: The boundary between an exon and an intron, where the splicing machinery cuts and joins RNA to remove introns and connect exons.
- Isoform: Different mRNA molecules produced from the same gene through alternative splicing, leading to variations in the protein produced.
- Differential Splicing Analysis: A method to compare splicing patterns between different conditions or groups (e.g., treated vs. untreated samples) to identify changes in alternative splicing.
- Splicing Events: Specific types of alternative splicing patterns, such as:
- Exon Skipping: An exon is skipped or included in the mRNA.
- Intron Retention: An intron is retained in the mRNA instead of being spliced out.
- Alternative 5’ Splice Site: Variation in the site where splicing occurs at the 5’ end of an exon.
- Alternative 3’ Splice Site: Variation in the site where splicing occurs at the 3’ end of an exon.
- Mutually Exclusive Exons: Only one of two exons is included in the mRNA.
- Spliceosome: A complex of proteins and small nuclear RNAs (snRNAs) responsible for removing introns and joining exons during RNA splicing.
- PSI (Percent Spliced In): A metric used to quantify exon inclusion levels, typically calculated as the percentage of transcripts including a specific exon.
- Read Count: The number of sequencing reads mapping to a gene, exon, or splice junction, used as a measure of gene expression or splicing.
- Splicing Factor: A protein involved in the regulation of splicing events, controlling the inclusion or exclusion of exons.
- GTF/GFF File: A file format that describes gene and transcript annotations, providing information about exon-intron structures.
- Alignment: The process of mapping RNAseq reads to a reference genome or transcriptome to determine the origin and structure of transcripts.
- Splice Variants: Different versions of mRNA produced by a single gene through alternative splicing.
- Junction Reads: Sequencing reads that span across splice junctions, providing evidence of splicing events.