Your comprehensive reference for bioinformatics tools, databases, and resources used in next-generation sequencing analysis. Each resource links to official documentation and relevant tutorials on NGS101.
Analysis Software & Tools
RNA-seq Analysis
Alignment & Quantification Tools
STAR (Spliced Transcripts Alignment to a Reference)
- Ultra-fast RNA-seq aligner for splice-aware alignment
- Official: STAR GitHub
- Used in: RNA-seq Part 2: FASTQ to Counts
HISAT2 (Hierarchical Indexing for Spliced Alignment of Transcripts)
- Fast and sensitive alignment program for RNA-seq
- Official: HISAT2 Site
- Alternative to STAR with lower memory requirements
Salmon
- Alignment-free transcript quantification tool
- Official: Salmon Documentation
- Fast pseudo-alignment alternative
- Used in: RNA-seq Part 2: FASTQ to Counts
Kallisto
- Ultra-fast transcript quantification from RNA-seq reads
- Official: Kallisto Manual
- Pseudo-alignment based quantification
featureCounts (from Subread package)
- Highly efficient read summarization program
- Official: Subread Package
- Generates count matrices from BAM files
- Used in: RNA-seq Part 2: FASTQ to Counts
HTSeq
- Python framework for analyzing high-throughput sequencing data
- Official: HTSeq Documentation
- Alternative to featureCounts
Differential Expression Analysis
DESeq2
- R package for differential gene expression analysis based on negative binomial distribution
- Official: DESeq2 Bioconductor
- Industry standard for RNA-seq DE analysis
- Used in: RNA-seq Part 3: Count Table to DEGs
- Compare with: limma vs DESeq2 vs edgeR Tutorial
limma-voom
- Linear models for microarray and RNA-seq data with voom transformation
- Official: limma Bioconductor
- Excellent for complex experimental designs
- Used in: Comparing DE Methods Tutorial
edgeR
- Empirical analysis of digital gene expression data
- Official: edgeR Bioconductor
- Robust for low replicate counts
- Used in: Comparing DE Methods Tutorial
Normalization Methods
TPM (Transcripts Per Million)
- Within-sample normalization for gene length and library size
- Used for: Gene expression comparison within samples
- Learn more: RNA-seq Normalization Guide
RPKM/FPKM (Reads/Fragments Per Kilobase Million)
- Traditional normalization for single-end/paired-end RNA-seq
- Used for: Legacy comparison, avoid for DE analysis
- Learn more: RNA-seq Normalization Guide
TMM (Trimmed Mean of M-values)
- Between-sample normalization used by edgeR
- Used for: Correcting sequencing depth and RNA composition
- Learn more: RNA-seq Normalization Guide
Pathway & Functional Analysis
clusterProfiler
- R package for GO, KEGG, and pathway enrichment analysis
- Official: clusterProfiler Bioconductor
- Comprehensive pathway analysis suite
- Used in: RNA-seq Part 5: DEGs to Pathways
GSEA (Gene Set Enrichment Analysis)
- Pathway analysis based on ranked gene lists
- Official: GSEA Broad Institute
- Gold standard for pathway analysis
- Used in: RNA-seq Part 5: DEGs to Pathways
fgsea
- Fast Gene Set Enrichment Analysis in R
- Official: fgsea Bioconductor
- Faster alternative to GSEA Java implementation
DAVID (Database for Annotation, Visualization and Integrated Discovery)
- Web-based functional annotation tool
- Official: DAVID Bioinformatics
- User-friendly interface for beginners
Enrichr
- Web-based gene list enrichment analysis tool
- Official: Enrichr
- Quick enrichment analysis with multiple databases
Network Analysis & Co-expression
WGCNA (Weighted Gene Co-expression Network Analysis)
- R package for constructing gene co-expression networks
- Official: WGCNA CRAN
- Identifies gene modules and hub genes
- Used in: WGCNA Tutorial
GENIE3
- Gene regulatory network inference using random forests
- Official: GENIE3 Bioconductor
- Machine learning approach to network construction
- Used in: GENIE3 Tutorial
RegEnrich
- Master regulator analysis combining expression and regulatory networks
- Official: RegEnrich Bioconductor
- Identifies key transcription factors
- Used in: Master Regulator Analysis Tutorial
RTN (Reconstruction of Transcriptional regulatory Networks)
- R package for transcriptional network analysis
- Official: RTN Bioconductor
- Master regulator analysis
- Used in: Master Regulator Analysis Tutorial
Clustering & Classification
Hierarchical Clustering
- Standard clustering method in R (hclust, pheatmap)
- Used for: Grouping samples or genes by expression patterns
- Used in: Clustering Tutorial
K-means Clustering
- Partition-based clustering algorithm
- Used for: Identifying distinct gene expression clusters
- Used in: Clustering Tutorial
PAM50
- 50-gene signature for breast cancer subtype classification
- Official: genefu package
- Clinical breast cancer classifier
- Used in: Cancer Subtype Prediction
genefu
- R package for gene expression-based classifiers
- Official: genefu Bioconductor
- Multiple cancer classifiers
- Used in: Cancer Subtype Prediction
GSVA (Gene Set Variation Analysis)
- Non-parametric, unsupervised method for gene set enrichment
- Official: GSVA Bioconductor
- Sample-level pathway activity scores
- Used in: Cancer Subtype Prediction
Deconvolution & Cell Type Analysis
CIBERSORT
- Estimate cell type proportions from bulk RNA-seq
- Official: CIBERSORT Portal
- Gold standard for immune cell deconvolution
- Used in: Deconvolution Tutorial
EPIC (Estimating the Proportions of Immune and Cancer cells)
- Deconvolution specialized for cancer samples
- Official: EPIC GitHub
- Estimates immune and cancer cell fractions
- Used in: Deconvolution Tutorial
quanTIseq
- Quantification of immune cell types from RNA-seq
- Official: quanTIseq
- Immune cell type quantification
- Used in: Deconvolution Tutorial
Alternative Splicing Analysis
rMATS (Replicate Multivariate Analysis of Transcript Splicing)
- Statistical detection of differential alternative splicing
- Official: rMATS GitHub
- Event-based splicing analysis
- Used in: Alternative Splicing Tutorial
SUPPA2
- Fast, accurate alternative splicing analysis
- Official: SUPPA2 GitHub
- Transcript-based splicing quantification
- Used in: Alternative Splicing Tutorial
LeafCutter
- Quantifies intron splicing ratios
- Official: LeafCutter GitHub
- Novel splice junction discovery
- Used in: Alternative Splicing Tutorial
JunctionSeq
- Differential usage of exons and splice junctions
- Official: JunctionSeq Bioconductor
- Visualizes differential splicing events
DEXSeq
- Differential exon usage analysis
- Official: DEXSeq Bioconductor
- Identifies differential exon usage
- Used in: Transcript-Level Splicing Tutorial
Isoform Analysis
StringTie
- Transcript assembly and quantification
- Official: StringTie
- De novo transcript discovery
- Used in: Isoform Analysis Tutorial
RSEM (RNA-Seq by Expectation-Maximization)
- Accurate transcript quantification from RNA-seq
- Official: RSEM GitHub
- Handles multi-mapping reads
- Used in: Isoform Analysis Tutorial
IsoformSwitchAnalyzeR
- Identify isoform switches with functional consequences
- Official: IsoformSwitchAnalyzeR Bioconductor
- Predicts functional impact of isoform changes
- Used in: Transcript-Level Splicing Tutorial
RNA Editing Analysis
REDItools
- RNA editing detection from RNA-seq
- Official: REDItools
- A-to-I editing identification
- Used in: RNA Editing Tutorial
JACUSA
- Java framework for RNA editing detection
- Official: JACUSA GitHub
- Identifies RNA-DNA differences
SPRINT
- SNP-free RNA editing identification
- Official: SPRINT GitHub
- Does not require DNA-seq data
Non-Coding RNA Analysis
CIRCexplorer2
- Circular RNA identification from RNA-seq
- Official: CIRCexplorer2 GitHub
- circRNA detection and annotation
- Used in: Circular RNA Tutorial
CIRI3
- CircRNA identification tool
- Official: CIRI GitHub
- High sensitivity circRNA caller
- Used in: Circular RNA Tutorial
CircRNA Databases
- circBase: http://www.circbase.org/ (curated circRNA database)
- circAtlas: http://circatlas.biols.ac.cn/ (tissue expression atlas)
- circRNADb: http://reprod.njmu.edu.cn/circrnadb (functional annotations)
- circInteractome: https://circinteractome.irp.nia.nih.gov/ (miRNA binding predictions)
miRDeep2
- Discover known and novel miRNAs from small RNA-seq
- Official: miRDeep2
- miRNA discovery and quantification
- Used in: miRNA-seq Tutorial
sRNAbench
- Small RNA-seq analysis pipeline
- Official: sRNAbench
- Comprehensive small RNA profiling
- Used in: Small RNA-seq Tutorial
UMI-tools
- Handle Unique Molecular Identifiers in sequencing data
- Official: UMI-tools GitHub
- Deduplication with UMIs
- Used in: UMI-Based miRNA-seq Tutorial
Structural Variation & Fusion Detection
STAR-Fusion
- Fusion transcript detection from RNA-seq
- Official: STAR-Fusion GitHub
- Built on STAR aligner
- Used in: Fusion Gene Tutorial
Arriba
- Fast fusion detection from RNA-seq
- Official: Arriba GitHub
- Low false-positive rate
- Used in: Fusion Gene Tutorial
FusionCatcher
- Find somatic fusion genes, translocations and chimeras
- Official: FusionCatcher
- Specialized for cancer fusion detection
- Used in: FusionCatcher Tutorial
Viral Sequence Detection
Kraken2
- Taxonomic classification of metagenomic sequences
- Official: Kraken2 GitHub
- Fast viral detection
- Used in: Viral Sequence Detection Tutorial
STAR + Viral Genome
- Align to combined host and viral reference
- Method: Map RNA-seq to host+viral genomes
- Quantify viral gene expression
- Used in: Viral Gene Expression Tutorial
Batch Effect & Covariates Correction
ComBat (from sva package)
- Remove batch effects from expression data
- Official: sva Bioconductor
- Classic batch correction method
- Used in: Batch Effects Tutorial
limma removeBatchEffect
- Remove batch effects for visualization
- Official: limma Bioconductor
- Good for exploratory analysis
- Used in: Batch Effects Tutorial
RUVSeq
- Remove unwanted variation from RNA-seq
- Official: RUVSeq Bioconductor
- Uses negative control genes
- Used in: Batch Effects Tutorial
Epigenetics Tools
ChIP-seq Analysis
HOMER (Hypergeometric Optimization of Motif EnRichment)
- Complete ChIP-seq analysis suite
- Official: HOMER
- Peak calling, motif discovery, annotation
- Used in: ChIP-seq Part 1: FASTQ to Peaks with HOMER
MACS2 (Model-based Analysis of ChIP-Seq)
- Peak caller for ChIP-seq data
- Official: MACS GitHub
- Industry standard peak caller
- Used in: ChIP-seq Part 4: FASTQ to Peaks with MACS2
DiffBind
- Differential binding analysis for ChIP-seq/ATAC-seq
- Official: DiffBind Bioconductor
- Consensus peak analysis
- Used in: ChIP-seq Part 3: Differential Binding, ATAC-seq Part 2: DiffBind
IDR (Irreproducible Discovery Rate)
- Assess reproducibility between ChIP-seq replicates
- Official: IDR GitHub
- Quality control for replicates
- Used in: ChIP-seq Part 5: IDR Analysis
deepTools
- Suite for ChIP-seq data visualization and QC
- Official: deepTools
- Heatmaps, profile plots, correlation
- Used in: ChIP-seq Part 2: Visualization
ChIPseeker
- R package for ChIP peak annotation and visualization
- Official: ChIPseeker Bioconductor
- Annotate peaks to nearest genes
- Used in: ChIP-seq Part 3: Differential Binding
MEME Suite
- Motif discovery and analysis tools
- Official: MEME Suite
- Comprehensive motif analysis
- Used in: ChIP-seq Part 3: Differential Binding
ATAC-seq Analysis
MACS2 (for ATAC-seq)
- Peak calling optimized for ATAC-seq
- Official: MACS GitHub
- Use with –shift parameters for ATAC
- Used in: ATAC-seq Part 1: FASTQ to Peaks
Genrich
- Peak caller designed for ATAC-seq and ChIP-seq
- Official: Genrich GitHub
- Handles replicates effectively
- Used in: ATAC-seq Part 1: FASTQ to Peaks
TOBIAS (Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal)
- Transcription factor footprinting from ATAC-seq
- Official: TOBIAS
- Differential footprinting analysis
- Used in: ATAC-seq Part 3: Footprinting
HINT-ATAC
- Footprinting tool for ATAC-seq
- Official: RGT Suite
- TF binding site prediction
- Used in: ATAC-seq Part 3: Footprinting
ArchR
- R package for ATAC-seq and single-cell ATAC-seq
- Official: ArchR
- Comprehensive ATAC-seq analysis
- Used in: ATAC-seq and RNA-seq Integration
CUT&RUN/CUT&Tag Analysis
SEACR (Sparse Enrichment Analysis for CUT&RUN)
- Peak caller specialized for CUT&RUN data
- Official: SEACR GitHub
- Works without input controls
- Used in: CUT&RUN/Tag Tutorial
MACS2 (for CUT&RUN)
- Can be adapted for CUT&RUN analysis
- Official: MACS GitHub
- Use with adjusted parameters
- Alternative in: CUT&RUN/Tag Tutorial
Hi-C & 3D Genome Organization
Juicer
- Complete Hi-C analysis pipeline
- Official: Juicer
- From raw reads to contact maps
- Used in: Hi-C Tutorial
HiC-Pro
- Optimized and flexible Hi-C processing pipeline
- Official: HiC-Pro
- Fast Hi-C data processing
cooler
- Store and access sparse contact matrices
- Official: cooler GitHub
- Efficient Hi-C data storage
Juicebox
- Visualization software for Hi-C data
- Official: Juicebox
- Interactive contact map viewer
- Used in: Hi-C Tutorial
DNA Methylation Analysis
minfi
- Illumina methylation array analysis in R
- Official: minfi Bioconductor
- EPIC and 450k array analysis
- Used in: DNA Methylation Part 1: EPIC Arrays
ChAMP
- Chip Analysis Methylation Pipeline
- Official: ChAMP Bioconductor
- Comprehensive array analysis
- Used in: DNA Methylation Part 1: EPIC Arrays
Bismark
- Bisulfite read mapper and methylation caller
- Official: Bismark
- Gold standard for WGBS/RRBS
- Used in: DNA Methylation Part 2: WGBS/RRBS
methylKit
- R package for DNA methylation analysis
- Official: methylKit Bioconductor
- Differential methylation analysis
- Used in: DNA Methylation Part 3: Biological Insights
DSS (Dispersion Shrinkage for Sequencing data)
- Differential methylation for bisulfite sequencing
- Official: DSS Bioconductor
- Robust statistical testing
- Used in: DNA Methylation Part 3: Biological Insights
Genomics & Variant Analysis
Alignment Tools
BWA (Burrows-Wheeler Aligner)
- Fast and accurate DNA sequence aligner
- Official: BWA
- Standard for WGS/WES alignment
- Used in: WGS Part 1: Raw Reads to Variants
Bowtie2
- Fast and memory-efficient read aligner
- Official: Bowtie2
- Good for longer reads
- Alternative for WGS alignment
minimap2
- Versatile aligner for long reads and assemblies
- Official: minimap2
- Excellent for Oxford Nanopore and PacBio
Variant Calling – Germline
GATK (Genome Analysis Toolkit)
- Industry standard for germline variant calling
- Official: GATK Broad Institute
- HaplotypeCaller for germline variants
- Used in: WGS Part 1: Raw Reads to Variants, WES Tutorial
FreeBayes
- Bayesian genetic variant detector
- Official: FreeBayes
- Haplotype-based variant detection
DeepVariant
- Deep learning-based variant caller
- Official: DeepVariant
- High accuracy with deep neural networks
Variant Calling – Somatic
Mutect2 (GATK)
- Somatic mutation caller for tumor-normal pairs
- Official: GATK Mutect2
- Gold standard for somatic variant calling
- Used in: WGS Part 2A: Tumor-Normal Mutation Calling
VarScan2
- Variant detection for massively parallel sequencing
- Official: VarScan
- Somatic and germline calling
- Used in: WGS Part 2B: Unmatched Sample Strategies
MuSE
- Somatic mutation caller for tumor-normal pairs
- Official: MuSE
- Good for tumor purity estimation
- Used in: WGS Part 2B: Unmatched Sample Strategies
SomaticSniper
- Identify somatic mutations in tumor/normal pairs
- Official: SomaticSniper
- Fast SNV calling
Variant Annotation
ANNOVAR
- Functional annotation of genetic variants
- Official: ANNOVAR
- Comprehensive variant annotation
- Used in: WGS Part 3: Annotating Variants
VEP (Variant Effect Predictor)
- Ensembl’s variant annotation tool
- Official: VEP
- Rich functional annotations
- Used in: WGS Part 3: Annotating Variants
SnpEff
- Genomic variant annotation and effect prediction
- Official: SnpEff
- Fast functional annotation
- Used in: WGS Part 3: Annotating Variants
InterVar
- Clinical interpretation of variants
- Official: InterVar
- ACMG guideline-based classification
- Used in: WGS Part 5: Disease-Specific Variants
Copy Number Variation (CNV)
CNVkit
- Copy number variation detection from targeted sequencing
- Official: CNVkit
- Tumor CNV calling
- Used in: WGS Part 6-2: Tumor CNVs, WES Tutorial
GATK gCNV
- Germline CNV calling pipeline
- Official: GATK gCNV
- Reference-based CNV detection
- Used in: WGS Part 6: Germline CNVs
XHMM
- Exome-based CNV detection
- Official: XHMM
- Specialized for exome data
- Used in: WGS Part 6: Germline CNVs
GISTIC2
- Identify significant copy number alterations in cancer
- Official: GISTIC2
- Tumor CNV significance analysis
Mutation Visualization & Interpretation
maftools
- Summarize, visualize and analyze MAF files
- Official: maftools Bioconductor
- Comprehensive mutation analysis
- Used in: WGS Part 4: Visualizing Mutations
MutationalPatterns
- Extract and visualize mutational signatures
- Official: MutationalPatterns Bioconductor
- COSMIC signature analysis
- Used in: WGS Part 4: Visualizing Mutations
deconstructSigs
- Identify mutational signatures in tumor samples
- Official: deconstructSigs
- Signature decomposition
- Used in: WGS Part 4: Visualizing Mutations
GWAS & Population Genetics
PLINK
- Whole genome association analysis toolset
- Official: PLINK
- Standard for GWAS analysis
- Used in: GWAS Tutorial
GCTA
- Genome-wide complex trait analysis
- Official: GCTA
- Heritability estimation
LocusZoom
- Regional association plot visualization
- Official: LocusZoom
- GWAS results visualization
Single-Cell Analysis
scRNA-seq Analysis
CellRanger (10x Genomics)
- Process 10x Chromium single-cell data
- Official: CellRanger
- From FASTQ to count matrix
- Used in: scRNA-seq Part 1: FASTQ to Count Matrix
STARsolo
- Single-cell RNA-seq alignment and quantification
- Official: STAR GitHub
- Alternative to CellRanger
- Used in: scRNA-seq Part 1: FASTQ to Count Matrix
Seurat
- R toolkit for single-cell genomics
- Official: Seurat
- Industry standard for scRNA-seq analysis
- Used in: scRNA-seq Part 2: QC, Part 3: Integration, Part 4: Cell Type ID
Scanpy
- Python-based single-cell analysis
- Official: Scanpy
- Scalable single-cell analysis
SingleCellExperiment
- Bioconductor infrastructure for single-cell data
- Official: SingleCellExperiment
- Data structure for scRNA-seq
scRNA-seq Quality Control
DoubletFinder
- Detect doublets in single-cell RNA-seq
- Official: DoubletFinder
- Doublet detection
- Used in: scRNA-seq Part 2: QC
Scrublet
- Python package for doublet detection
- Official: Scrublet
- Simulation-based doublet detection
miQC
- Flexible quality control for scRNA-seq
- Official: miQC Bioconductor
- Model-based QC
scRNA-seq Integration & Batch Correction
Harmony
- Fast integration of single-cell data
- Official: Harmony
- Scalable batch correction
- Used in: scRNA-seq Part 3: Integration
Seurat Integration
- Canonical correlation analysis-based integration
- Official: Seurat Integration
- Built into Seurat workflow
- Used in: scRNA-seq Part 3: Integration
scVI (single-cell Variational Inference)
- Deep learning for scRNA-seq analysis
- Official: scVI
- Neural network-based integration
Cell Type Annotation
SingleR
- Automated cell type annotation
- Official: SingleR Bioconductor
- Reference-based annotation
- Used in: scRNA-seq Part 4: Cell Type ID
CellTypist
- Machine learning-based cell type classifier
- Official: CellTypist
- Pre-trained models for annotation
Azimuth
- Reference-based cell type mapping
- Official: Azimuth
- Web-based and R package
Quality Control & Utilities
Sequencing Quality Control
FastQC
- Quality control tool for high throughput sequence data
- Official: FastQC
- First-pass quality assessment
- Used across all NGS tutorials
MultiQC
- Aggregate results from multiple bioinformatics analyses
- Official: MultiQC
- Comprehensive QC reporting
- Used across all NGS tutorials
Trimmomatic
- Flexible read trimming tool for Illumina data
- Official: Trimmomatic
- Adapter trimming and quality filtering
Cutadapt
- Finds and removes adapter sequences
- Official: Cutadapt
- Flexible adapter removal
fastp
- Ultra-fast FASTQ preprocessing
- Official: fastp
- All-in-one FASTQ processor
File Manipulation & Utilities
SAMtools
- Suite for manipulating alignments in SAM/BAM format
- Official: SAMtools
- Essential for BAM file operations
- Used across all NGS tutorials
BEDtools
- Toolset for genome arithmetic
- Official: BEDtools
- Interval operations on genomic features
- Used across multiple tutorials
BCFtools
- Utilities for variant calling and manipulating VCF/BCF files
- Official: BCFtools
- VCF file manipulation
Picard
- Java-based tools for manipulating HTS data
- Official: Picard
- Mark duplicates, collect metrics
- Used in: WGS Part 1
IGV (Integrative Genomics Viewer)
- Visualization tool for genomics data
- Official: IGV
- Interactive genome browser
- Used for visualization across tutorials
UCSC Genome Browser
- Web-based genome browser
- Official: UCSC Browser
- Explore genomic data
R Visualization Packages
ggplot2
- Data visualization package for R
- Official: ggplot2
- Publication-quality plots
- Used in: RNA-seq Part 4: Publication Figures
pheatmap
- Pretty heatmaps in R
- Official: pheatmap
- Heatmap generation
- Used across RNA-seq tutorials
ComplexHeatmap
- Make complex heatmaps in R
- Official: ComplexHeatmap Bioconductor
- Advanced heatmap visualization
EnhancedVolcano
- Publication-ready volcano plots
- Official: EnhancedVolcano Bioconductor
- Volcano plot customization
- Used in: RNA-seq Part 4: Publication Figures
CRISPR Screen Analysis
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout)
- Identify essential genes from CRISPR screens
- Official: MAGeCK
- sgRNA library analysis
- Used in: CRISPR Screen Tutorial
CRISPResso2
- Analysis of CRISPR editing outcomes
- Official: CRISPResso2
- Editing efficiency quantification
Databases & Data Resources
Gene Expression Databases
NCBI GEO (Gene Expression Omnibus)
- Public repository for gene expression data
- Official: GEO
- Download published datasets
- Used in: Submitting to GEO Tutorial
TCGA (The Cancer Genome Atlas)
- Comprehensive cancer genomics data
- Official: TCGA via GDC
- 33 cancer types, multi-omics
- Used in: TCGA Data Download Tutorial
GTEx (Genotype-Tissue Expression)
- Gene expression across human tissues
- Official: GTEx Portal
- Normal tissue expression reference
- Used in: GTEx Data Download Tutorial
ArrayExpress
- Functional genomics data archive
- Official: ArrayExpress
- European alternative to GEO
recount3
- Unified access to RNA-seq datasets
- Official: recount3
- Pre-processed RNA-seq data
GEPIA3 (https://gepia3.bioinfoliu.com/) – Gene Expression Profiling Interactive Analysis v3
- Interactive web tool for analyzing RNA-seq data from TCGA and GTEx
- Focused on cancer genomics and differential expression analysis
UCSC Xena (https://xena.ucsc.edu/) – UCSC Xena Browser
- Multi-omic data visualization platform
- Hosts TCGA, GTEx, and other large cancer genomics datasets
Reference Genomes & Annotations
GENCODE
- High-quality gene annotations
- Official: GENCODE
- Human and mouse annotations
- Used across all RNA-seq tutorials
Ensembl
- Genome browser and annotation database
- Official: Ensembl
- Comprehensive genome annotations
- Used across tutorials
UCSC Genome Browser
- Reference genomes and annotations
- Official: UCSC Downloads
- Alternative genome references
RefSeq
- NCBI Reference Sequence Database
- Official: RefSeq
- Curated gene annotations
refgenie
- Reference genome manager
- Official: refgenie
- Pre-built genome references
Illumina iGenomes
- Ready-to-use reference sequences and annotations
- Official: iGenomes
- Pre-indexed genomes
Variant & Population Databases
gnomAD (Genome Aggregation Database)
- Population allele frequencies
- Official: gnomAD
- 140,000+ genomes/exomes
- Used in: WGS Part 5: Disease Variants
ClinVar
- Variation-disease relationships
- Official: ClinVar
- Clinical variant interpretation
- Used in: WGS Part 5: Disease Variants
dbSNP
- Short genetic variations
- Official: dbSNP
- Variant IDs and frequencies
1000 Genomes
- Human genetic variation catalog
- Official: 1000 Genomes
- Population genetics reference
COSMIC (Catalogue of Somatic Mutations in Cancer)
- Cancer somatic mutation database
- Official: COSMIC
- Cancer mutation catalog
Database of Genomic Variants (DGV)
- Structural variation in healthy individuals
- Official: DGV
- Benign CNV reference
Clinical & Disease Databases
OMIM (Online Mendelian Inheritance in Man)
- Human genes and genetic disorders
- Official: OMIM
- Gene-disease relationships
ClinGen
- Clinical genome resource
- Official: ClinGen
- Gene-disease validity
DECIPHER
- Database of genomic variation and phenotype
- Official: DECIPHER
- Rare disease genomics
GenCC (Gene Curation Coalition)
- Curated gene-disease relationships
- Official: GenCC
- Standardized gene-disease assertions
HGMD (Human Gene Mutation Database)
- Disease-causing mutations
- Official: HGMD
- Mutation catalog (subscription)
Orphanet
- Rare disease and orphan drug portal
- Official: Orphanet
- Rare disease information
cBioPortal
- Cancer genomics data visualization
- Official: cBioPortal
- Interactive cancer genomics
- Used in mutation visualization
Pathway & Functional Databases
MSigDB (Molecular Signatures Database)
- Gene sets for pathway analysis
- Official: MSigDB
- Hallmark, GO, KEGG gene sets
- Used in: RNA-seq Part 5: Pathways
KEGG (Kyoto Encyclopedia of Genes and Genomes)
- Pathway and disease databases
- Official: KEGG
- Metabolic and signaling pathways
Reactome
- Pathway knowledge base
- Official: Reactome
- Curated pathway database
Gene Ontology (GO)
- Gene function classification
- Official: GO
- Biological process, molecular function, cellular component
WikiPathways
- Community-curated pathway database
- Official: WikiPathways
- Open-source pathways
Regulatory & Epigenetics Databases
ENCODE
- Encyclopedia of DNA Elements
- Official: ENCODE
- Functional genomics data
- ChIP-seq, ATAC-seq, RNA-seq datasets
TRRUST (Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining)
- Human/mouse transcription factor-target interactions
- Official: TRRUST
- TF regulatory networks
- Used in network analysis tutorials
RegNetwork
- Regulatory network repository
- Official: RegNetwork
- TF-target gene relationships
JASPAR
- Transcription factor binding profile database
- Official: JASPAR
- TF motifs
Cistrome DB
- ChIP-seq and chromatin accessibility database
- Official: Cistrome DB
- Curated ChIP-seq data
miRNA Databases
miRBase
- MicroRNA sequences and annotation
- Official: miRBase
- miRNA reference database
- Used in: miRNA-seq tutorials
TargetScan
- Predict miRNA target sites
- Official: TargetScan
- miRNA-mRNA interactions
miRDB
- MicroRNA target prediction database
- Official: miRDB
- miRNA target genes
Single-Cell Reference Databases
Human Cell Atlas
- Reference maps of human cells
- Official: Human Cell Atlas
- Single-cell reference data
PanglaoDB
- Single-cell sequencing database
- Official: PanglaoDB
- Cell type markers
CellMarker
- Cell marker database
- Official: CellMarker
- Manually curated cell markers
Computing & Environment
Package Management
Conda / Miniforge
- Package and environment manager
- Official: Miniforge
- Python and R package installation
- Used in: RNA-seq Part 1: Environment Setup
Bioconda
- Bioinformatics software distribution
- Official: Bioconda
- 9000+ bioinformatics packages
- Browse: Bioconda Package Index
Conda-forge
- Community-driven conda channel
- Official: Conda-forge
- General software packages
- Browse: Conda-forge Packages
Programming Languages & IDEs
R
- Statistical computing language
- Official: R Project
- Essential for bioinformatics analysis
- Used across all tutorials
RStudio
- Integrated development environment for R
- Official: RStudio
- User-friendly R interface
Python
- General-purpose programming language
- Official: Python
- Versatile for bioinformatics
Jupyter
- Interactive computing notebooks
- Official: Jupyter
- Python/R notebook interface
High Performance Computing (HPC)
Slurm
- Workload manager for HPC clusters
- Official: Slurm
- Job submission and management
- Used in: Slurm Tutorial
PBS/Torque
- Alternative HPC job scheduler
- Official: PBS Works
- Cluster job management
SGE (Sun Grid Engine)
- Distributed resource management
- Alternative HPC scheduler
Containerization
Docker
- Platform for developing and running applications in containers
- Official: Docker
- Reproducible analysis environments
- Used in: Docker Tutorial
Singularity/Apptainer
- Container platform for HPC
- Official: Apptainer
- HPC-friendly containerization
Workflow Management
Snakemake
- Workflow management system for Python
- Official: Snakemake
- Reproducible and scalable analysis
Nextflow
- Data-driven computational pipelines
- Official: Nextflow
- Portable workflow framework
WDL (Workflow Description Language)
- Workflow specification language
- Official: WDL
- Used by GATK pipelines
Learning Resources
Cheat Sheets
Conda Cheat Sheet
- Quick reference for conda commands
- Official: Conda Cheat Sheet
R Cheat Sheets (Posit)
- Comprehensive R package cheat sheets
- Official: Posit Cheat Sheets
- ggplot2, dplyr, tidyr, and more
R Cheat Sheets (Kaggle)
- Community R cheat sheet collection
- Community: Kaggle R Cheat Sheets
data.table Cheat Sheet
- Fast data manipulation in R
- Official: data.table Cheat Sheet
ggplot2 Cheat Sheet
- Data visualization with ggplot2
- Official: ggplot2 Cheat Sheet
Unix Command Line Cheat Sheet
- Essential Linux/Unix commands
- Various resources available online
Documentation Hubs
Bioconductor
- R packages for genomic data analysis
- Official: Bioconductor
- 2000+ bioinformatics packages
Galaxy Project
- Web-based platform for data analysis
- Official: Galaxy
- No-code bioinformatics
NGS 101 Complete Tutorial Library
- Your current site – 70+ comprehensive tutorials
- Home: NGS 101 Tutorials
- Beginner-friendly, step-by-step guides
Quick Reference
File Formats Guide
FASTQ Format
- Raw sequencing reads with quality scores
- Learn: NGS Data Types Guide
BAM/SAM Format
- Sequence Alignment/Map format
- Learn: NGS Data Types Guide
VCF/BCF Format
- Variant Call Format
- Learn: NGS Data Types Guide
BED Format
- Browser Extensible Data format
- Learn: NGS Data Types Guide
GTF/GFF Format
- Gene Transfer/General Feature Format
- Learn: NGS Data Types Guide
Complete File Format Reference
- Comprehensive guide to all NGS file types
- Tutorial: NGS Data Types and Formats
Data Management
HPC Data Management Guide
- Storage, transfer, and sharing best practices
- Tutorial: HPC Data Management
NCBI Database Guide
- Complete guide to NCBI resources
- Tutorial: NCBI Databases
Additional Resources
Lei’s Other Educational Content
BullishBooks
- Entrepreneurship and personal development
- Website: BullishBooks.com
- Focus: Building sustainable businesses
This resource page is continuously updated as new tools and tutorials are added. Last update: January 2026
Need help with a specific analysis? Check out our complete tutorial library.
Looking for a specific tool? Use Ctrl+F (or Cmd+F on Mac) to search this page.
Want to suggest a resource? Contact us through our collaborations page.
Tags:
RNA-seq analysis tools, ChIP-seq software, ATAC-seq analysis, DNA methylation tools, variant calling pipelines, single-cell RNA-seq, bioinformatics databases, genomics tools, NGS data analysis, DESeq2 tutorial, GATK variant calling, Seurat single-cell, WGCNA network analysis, pathway enrichment tools, cancer genomics databases, TCGA data access, GEO database, reference genomes, bioconductor packages, conda bioinformatics, HPC cluster computing, Docker containerization, variant annotation, copy number analysis, fusion gene detection, alternative splicing analysis, epigenetics tools, Hi-C analysis, CRISPR screen analysis, GWAS tools, population genetics databases, clinical variant interpretation, transcription factor databases, miRNA target prediction, cell type deconvolution, batch effect correction, quality control tools, genome visualization, mutation signature analysis.



