Resources

Reference Hub

Resources

A curated reference for every tool, database, and resource used across NGS101 tutorials. Each entry links to official documentation and the relevant NGS101 guide where applicable.

๐Ÿ”ง

Analysis Software & Tools

RNA-seq โ€” Alignment & Quantification
STAR
Ultra-fast splice-aware RNA-seq aligner. Industry standard for aligning reads to the reference genome.
HISAT2
Fast, memory-efficient RNA-seq aligner. Alternative to STAR with a lower memory footprint.
Salmon
Alignment-free transcript quantification via quasi-mapping. Extremely fast alternative to alignment-based methods.
Kallisto
Ultra-fast transcript quantification using pseudoalignment. Great when speed is the priority.
featureCounts
Highly efficient read summarization program (from Subread). Generates count matrices from BAM files.
HTSeq
Python framework for counting reads per gene. A classic alternative to featureCounts for count matrix generation.
RNA-seq โ€” Differential Expression
DESeq2
R package for DE analysis based on the negative binomial distribution. Industry standard for RNA-seq.
limma-voom
Linear models for RNA-seq with voom transformation. Excellent for complex experimental designs and small samples.
edgeR
Empirical analysis of digital gene expression. Robust for datasets with low replicate counts.
ComBat / sva
Remove batch effects from expression data. Classic batch correction; use when batch is known.
RNA-seq โ€” Pathway & Functional Analysis
clusterProfiler
Comprehensive R package for GO, KEGG, and pathway enrichment analysis. The go-to for enrichment in R.
GSEA
Gene Set Enrichment Analysis for ranked gene lists. Gold standard for pathway analysis using all genes, not just DEGs.
fgsea
Fast preranked GSEA in R. Significantly faster than the Java GSEA implementation.
DAVID / Enrichr
Web-based enrichment tools for quick GO/KEGG analysis without R. Good for a fast first look.
RNA-seq โ€” Network & Co-expression Analysis
WGCNA
Weighted Gene Co-expression Network Analysis. Identifies gene modules and hub genes from expression data.
GENIE3
Gene regulatory network inference using random forests. Machine learning approach to TF-target network construction.
RegEnrich & RTN
Master regulator analysis packages that combine expression data with regulatory networks to identify key TFs.
GSVA
Gene Set Variation Analysis. Computes sample-level pathway activity scores from expression matrices.
RNA-seq โ€” Visualization
ggplot2
The definitive R data visualization package. Used for volcano plots, MA plots, boxplots, and publication figures.
pheatmap / ComplexHeatmap
Heatmap packages for R. pheatmap for quick results; ComplexHeatmap for publication-level customization.
EnhancedVolcano
Bioconductor package for highly customizable publication-ready volcano plots with labeled genes.
IGV / UCSC Browser
Genome browsers for visualizing alignments and genomic features. IGV is desktop; UCSC is web-based.
RNA-seq โ€” Special Applications
CIBERSORT / EPIC / quanTIseq
Reference-based cell-type deconvolution tools that estimate immune and cell-type proportions from bulk RNA-seq.
rMATS / SUPPA2 / DEXSeq
Alternative splicing analysis tools. rMATS for event-based analysis; SUPPA2 for transcript-based; DEXSeq for exon usage.
STAR-Fusion / Arriba / FusionCatcher
Fusion gene detection tools. All three identify chimeric transcripts; FusionCatcher is specialized for cancer.
MAGeCK
Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout. Identifies essential genes from CRISPR screen data.
Epigenetics โ€” ChIP-seq & ATAC-seq
HOMER
Complete ChIP-seq analysis suite covering peak calling, motif discovery, and annotation. Versatile and widely used.
MACS2 / MACS3
Model-based peak caller for ChIP-seq and ATAC-seq. The industry standard for identifying enriched regions.
DiffBind
Differential binding analysis for ChIP-seq and ATAC-seq. Identifies condition-specific accessible regions or TF binding.
deepTools
Suite for ChIP-seq/ATAC-seq visualization and QC. Generates heatmaps, profile plots, and correlation matrices.
TOBIAS / HINT-ATAC
Transcription factor footprinting from ATAC-seq. Predicts TF binding sites at single-nucleotide resolution.
Juicer / HiC-Pro
Complete Hi-C analysis pipelines for 3D genome organization. Juicer includes Juicebox for interactive visualization.
Epigenetics โ€” DNA Methylation
minfi / ChAMP
R packages for Illumina methylation arrays (EPIC, 450k). minfi is the Bioconductor standard; ChAMP adds workflow automation.
Bismark
Gold-standard bisulfite read mapper and methylation caller for WGBS and RRBS data.
methylKit / DSS
Differential methylation analysis for bisulfite sequencing. DSS is preferred for its robust statistical testing.
SEACR
Sparse Enrichment Analysis for CUT&RUN/CUT&Tag. Peak caller that works without input controls.
Genomics โ€” Variant Calling & Annotation
BWA / Bowtie2
DNA sequence aligners for WGS/WES. BWA is the standard for short reads; Bowtie2 for longer reads.
GATK HaplotypeCaller
Industry standard for germline variant calling. Part of the GATK Best Practices pipeline from Broad Institute.
Mutect2
Gold-standard somatic mutation caller for tumor-normal pairs. Part of GATK, optimized for cancer genomics.
ANNOVAR / VEP / SnpEff
Variant annotation tools. ANNOVAR and VEP are most comprehensive; SnpEff is fast and command-line friendly.
CNVkit / GATK gCNV
Copy number variation callers. CNVkit for targeted sequencing/tumor CNVs; GATK gCNV for germline analysis.
maftools / MutationalPatterns
Mutation visualization and signature analysis. maftools for summary figures; MutationalPatterns for COSMIC signatures.
Single-Cell RNA-seq
CellRanger / STARsolo
Process 10x Chromium single-cell data from FASTQ to count matrix. STARsolo is the open-source CellRanger alternative.
Seurat
The R toolkit for single-cell genomics. Covers QC, normalization, clustering, integration, and visualization.
Harmony
Fast integration of single-cell datasets from different batches or conditions. Widely adopted for scRNA-seq batch correction.
SingleR / CellTypist
Automated cell type annotation using reference datasets. SingleR is Bioconductor-native; CellTypist uses pre-trained ML models.
Monocle 3 / Slingshot
Trajectory and pseudotime analysis for single-cell data. Infers developmental or differentiation paths from cell states.
CellChat / NicheNet
Cell-cell communication analysis tools. CellChat infers signaling networks; NicheNet predicts intercellular ligand-receptor interactions.
Quality Control & File Utilities
FastQC / MultiQC
FastQC assesses individual sample quality; MultiQC aggregates QC reports across all samples into one summary.
Trimmomatic / fastp / Cutadapt
Read trimming tools for adapter removal and quality filtering. fastp is the fastest and most modern all-in-one option.
SAMtools
Essential suite for manipulating SAM/BAM alignment files. Used in virtually every NGS pipeline.
BEDtools / BCFtools / Picard
Core file manipulation utilities. BEDtools for interval arithmetic; Picard for duplicate marking and metrics.
๐Ÿ—„๏ธ

Databases & Data Resources

Gene Expression & Multi-Omics
NCBI GEO
The primary public repository for gene expression and genomics datasets.
GEO Portal
TCGA
Comprehensive cancer multi-omics data across 33 cancer types via GDC portal.
GDC Portal
GTEx
Gene expression across 54 human tissues from postmortem donors. Normal tissue reference.
GTEx Portal
ArrayExpress
European Bioinformatics Institute’s functional genomics data archive. European alternative to GEO.
ArrayExpress
GEPIA3
Interactive web tool for TCGA/GTEx expression analysis with built-in survival and differential expression.
GEPIA3
UCSC Xena
Multi-omic visualization platform hosting TCGA, GTEx, and other large cancer genomics datasets.
Xena Browser
Reference Genomes & Annotations
GENCODE
High-quality gene annotations for human and mouse. Recommended for most RNA-seq workflows.
GENCODE
Ensembl
Comprehensive genome browser and annotation database supporting 200+ species.
Ensembl
Illumina iGenomes
Pre-built, indexed reference genomes ready for download and use in pipelines.
iGenomes
Variant & Population Databases
gnomAD
Population allele frequencies from 140,000+ genomes/exomes. Essential for variant filtering.
gnomAD
ClinVar
NCBI database of variation-disease relationships and clinical significance classifications.
ClinVar
COSMIC
Catalogue of somatic mutations in cancer โ€” the reference for cancer mutation annotation.
COSMIC
dbSNP
NCBI reference for short genetic variations including SNPs and indels with rsIDs.
dbSNP
1000 Genomes
Human genetic variation catalog covering 2,504 individuals from 26 populations.
1000 Genomes
cBioPortal
Interactive cancer genomics visualization and analysis platform for multi-study comparisons.
cBioPortal
Pathway & Functional Databases
MSigDB
Molecular Signatures Database โ€” curated gene sets including Hallmark, GO, and KEGG collections for GSEA.
MSigDB
KEGG
Kyoto Encyclopedia of Genes and Genomes. Metabolic, signaling, and disease pathway maps.
KEGG
Reactome
Manually curated pathway knowledgebase with detailed reaction-level annotations.
Reactome
Gene Ontology
Universal gene function classification covering Biological Process, Molecular Function, and Cellular Component.
GO Database
JASPAR
Open-access database of transcription factor binding profiles. Used for motif analysis in ChIP-seq.
JASPAR
TRRUST / RegNetwork
TF-target interaction databases for network analysis and master regulator identification.
TRRUST
Single-Cell Reference Databases
Human Cell Atlas
Reference maps of all human cell types. Gold-standard single-cell reference data.
HCA Portal
CellMarker
Manually curated database of cell type markers for human and mouse tissues.
CellMarker
PanglaoDB
Single-cell sequencing database with curated cell type marker genes.
PanglaoDB
๐Ÿ’ป

Computing & Environment

Package Management
  • Conda / Miniforge Package and environment manager for Python and R. The recommended setup for all NGS101 workflows. Miniforge
  • Bioconda Bioinformatics software distribution on conda with 9,000+ packages. Bioconda
  • Pixi Next-generation environment manager. Faster than conda, zero version conflicts. Covered in NGS101 HPC tutorials. Pixi Docs
  • Bioconductor R package repository for genomic data analysis with 2,000+ curated bioinformatics packages. Bioconductor
HPC & Cluster Computing
  • Slurm Workload manager for HPC clusters. The most widely used job scheduler in academic computing environments. Slurm Docs
  • PBS / Torque Alternative HPC job scheduler common in older institutional clusters. PBS Works
  • Singularity / Apptainer HPC-friendly containerization. Run Docker containers on HPC systems without root access. Apptainer
Containerization & Workflow Management
  • Docker Platform for reproducible analysis environments. Package your entire pipeline with all dependencies. Docker Docs
  • Snakemake Python-based workflow manager for reproducible and scalable bioinformatics pipelines. Snakemake
  • Nextflow Data-driven pipeline framework. Portable across HPC, cloud, and local environments. Nextflow
Programming Languages & IDEs
  • RStudio Integrated development environment for R. User-friendly interface for all R-based NGS101 tutorials. RStudio
  • Jupyter Lab Interactive notebooks for Python and R. Can be run interactively in-browser on HPC clusters. Jupyter
  • VS Code Versatile code editor with excellent R and Python support via extensions. VS Code
๐Ÿ“š

Learning Resources

Cheat Sheets
  • Conda Cheat Sheet Quick reference for conda environment and package management commands. Download PDF
  • R / ggplot2 Cheat Sheets Comprehensive cheat sheets for ggplot2, dplyr, tidyr, and data.table from Posit. Posit Cheat Sheets
  • Unix Command Line Essential Linux/Unix commands for navigating the command line in bioinformatics workflows. Unix Reference
NGS File Formats
  • FASTQ Raw sequencing reads with quality scores. Entry point for all NGS analyses. NGS101 Guide
  • BAM / SAM Sequence Alignment/Map format โ€” stores aligned reads with mapping information. NGS101 Guide
  • VCF / BCF Variant Call Format for storing SNPs, indels, and structural variants. NGS101 Guide
  • BED / GTF / GFF Genome annotation and interval formats used across all NGS pipelines. NGS101 Guide
Documentation Hubs
  • Bioconductor R packages for genomic data analysis. 2,000+ packages with vignettes and workflows. Bioconductor
  • Galaxy Project Web-based analysis platform โ€” no coding required. Good for quick exploratory analyses. Galaxy
  • NCBI Database Guide Complete guide to all NCBI resources covered in the NGS101 database tutorial. NGS101 Tutorial
From the Instructor
  • BullishBooks.com Dr. Lei Guo’s entrepreneurship and book review site โ€” focused on building sustainable businesses and careers in science. BullishBooks
Note: This resource page is continuously updated as new tools and tutorials are added. Use Ctrl+F to search for a specific tool. Want to suggest a resource? Contact us โ†’