Resources

Your comprehensive reference for bioinformatics tools, databases, and resources used in next-generation sequencing analysis. Each resource links to official documentation and relevant tutorials on NGS101.

Analysis Software & Tools

RNA-seq Analysis

Alignment & Quantification Tools

STAR (Spliced Transcripts Alignment to a Reference)

HISAT2 (Hierarchical Indexing for Spliced Alignment of Transcripts)

  • Fast and sensitive alignment program for RNA-seq
  • Official: HISAT2 Site
  • Alternative to STAR with lower memory requirements

Salmon

Kallisto

  • Ultra-fast transcript quantification from RNA-seq reads
  • Official: Kallisto Manual
  • Pseudo-alignment based quantification

featureCounts (from Subread package)

HTSeq

  • Python framework for analyzing high-throughput sequencing data
  • Official: HTSeq Documentation
  • Alternative to featureCounts

Differential Expression Analysis

DESeq2

limma-voom

edgeR

Normalization Methods

TPM (Transcripts Per Million)

  • Within-sample normalization for gene length and library size
  • Used for: Gene expression comparison within samples
  • Learn more: RNA-seq Normalization Guide

RPKM/FPKM (Reads/Fragments Per Kilobase Million)

  • Traditional normalization for single-end/paired-end RNA-seq
  • Used for: Legacy comparison, avoid for DE analysis
  • Learn more: RNA-seq Normalization Guide

TMM (Trimmed Mean of M-values)

  • Between-sample normalization used by edgeR
  • Used for: Correcting sequencing depth and RNA composition
  • Learn more: RNA-seq Normalization Guide

Pathway & Functional Analysis

clusterProfiler

GSEA (Gene Set Enrichment Analysis)

fgsea

  • Fast Gene Set Enrichment Analysis in R
  • Official: fgsea Bioconductor
  • Faster alternative to GSEA Java implementation

DAVID (Database for Annotation, Visualization and Integrated Discovery)

  • Web-based functional annotation tool
  • Official: DAVID Bioinformatics
  • User-friendly interface for beginners

Enrichr

  • Web-based gene list enrichment analysis tool
  • Official: Enrichr
  • Quick enrichment analysis with multiple databases

Network Analysis & Co-expression

WGCNA (Weighted Gene Co-expression Network Analysis)

  • R package for constructing gene co-expression networks
  • Official: WGCNA CRAN
  • Identifies gene modules and hub genes
  • Used in: WGCNA Tutorial

GENIE3

RegEnrich

RTN (Reconstruction of Transcriptional regulatory Networks)

Clustering & Classification

Hierarchical Clustering

  • Standard clustering method in R (hclust, pheatmap)
  • Used for: Grouping samples or genes by expression patterns
  • Used in: Clustering Tutorial

K-means Clustering

  • Partition-based clustering algorithm
  • Used for: Identifying distinct gene expression clusters
  • Used in: Clustering Tutorial

PAM50

genefu

GSVA (Gene Set Variation Analysis)

Deconvolution & Cell Type Analysis

CIBERSORT

EPIC (Estimating the Proportions of Immune and Cancer cells)

quanTIseq

Alternative Splicing Analysis

rMATS (Replicate Multivariate Analysis of Transcript Splicing)

SUPPA2

LeafCutter

JunctionSeq

  • Differential usage of exons and splice junctions
  • Official: JunctionSeq Bioconductor
  • Visualizes differential splicing events

DEXSeq

Isoform Analysis

StringTie

RSEM (RNA-Seq by Expectation-Maximization)

IsoformSwitchAnalyzeR

RNA Editing Analysis

REDItools

JACUSA

  • Java framework for RNA editing detection
  • Official: JACUSA GitHub
  • Identifies RNA-DNA differences

SPRINT

  • SNP-free RNA editing identification
  • Official: SPRINT GitHub
  • Does not require DNA-seq data

Non-Coding RNA Analysis

CIRCexplorer2

CIRI3

CircRNA Databases

miRDeep2

  • Discover known and novel miRNAs from small RNA-seq
  • Official: miRDeep2
  • miRNA discovery and quantification
  • Used in: miRNA-seq Tutorial

sRNAbench

UMI-tools

Structural Variation & Fusion Detection

STAR-Fusion

Arriba

FusionCatcher

Viral Sequence Detection

Kraken2

STAR + Viral Genome

  • Align to combined host and viral reference
  • Method: Map RNA-seq to host+viral genomes
  • Quantify viral gene expression
  • Used in: Viral Gene Expression Tutorial

Batch Effect & Covariates Correction

ComBat (from sva package)

limma removeBatchEffect

RUVSeq


Epigenetics Tools

ChIP-seq Analysis

HOMER (Hypergeometric Optimization of Motif EnRichment)

MACS2 (Model-based Analysis of ChIP-Seq)

DiffBind

IDR (Irreproducible Discovery Rate)

deepTools

ChIPseeker

MEME Suite

ATAC-seq Analysis

MACS2 (for ATAC-seq)

Genrich

TOBIAS (Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal)

HINT-ATAC

ArchR

CUT&RUN/CUT&Tag Analysis

SEACR (Sparse Enrichment Analysis for CUT&RUN)

MACS2 (for CUT&RUN)

Hi-C & 3D Genome Organization

Juicer

  • Complete Hi-C analysis pipeline
  • Official: Juicer
  • From raw reads to contact maps
  • Used in: Hi-C Tutorial

HiC-Pro

  • Optimized and flexible Hi-C processing pipeline
  • Official: HiC-Pro
  • Fast Hi-C data processing

cooler

  • Store and access sparse contact matrices
  • Official: cooler GitHub
  • Efficient Hi-C data storage

Juicebox

  • Visualization software for Hi-C data
  • Official: Juicebox
  • Interactive contact map viewer
  • Used in: Hi-C Tutorial

DNA Methylation Analysis

minfi

ChAMP

Bismark

methylKit

DSS (Dispersion Shrinkage for Sequencing data)


Genomics & Variant Analysis

Alignment Tools

BWA (Burrows-Wheeler Aligner)

Bowtie2

  • Fast and memory-efficient read aligner
  • Official: Bowtie2
  • Good for longer reads
  • Alternative for WGS alignment

minimap2

  • Versatile aligner for long reads and assemblies
  • Official: minimap2
  • Excellent for Oxford Nanopore and PacBio

Variant Calling – Germline

GATK (Genome Analysis Toolkit)

FreeBayes

  • Bayesian genetic variant detector
  • Official: FreeBayes
  • Haplotype-based variant detection

DeepVariant

  • Deep learning-based variant caller
  • Official: DeepVariant
  • High accuracy with deep neural networks

Variant Calling – Somatic

Mutect2 (GATK)

VarScan2

MuSE

SomaticSniper

  • Identify somatic mutations in tumor/normal pairs
  • Official: SomaticSniper
  • Fast SNV calling

Variant Annotation

ANNOVAR

VEP (Variant Effect Predictor)

SnpEff

InterVar

Copy Number Variation (CNV)

CNVkit

GATK gCNV

XHMM

GISTIC2

  • Identify significant copy number alterations in cancer
  • Official: GISTIC2
  • Tumor CNV significance analysis

Mutation Visualization & Interpretation

maftools

MutationalPatterns

deconstructSigs

GWAS & Population Genetics

PLINK

  • Whole genome association analysis toolset
  • Official: PLINK
  • Standard for GWAS analysis
  • Used in: GWAS Tutorial

GCTA

  • Genome-wide complex trait analysis
  • Official: GCTA
  • Heritability estimation

LocusZoom

  • Regional association plot visualization
  • Official: LocusZoom
  • GWAS results visualization

Single-Cell Analysis

scRNA-seq Analysis

CellRanger (10x Genomics)

STARsolo

Seurat

Scanpy

  • Python-based single-cell analysis
  • Official: Scanpy
  • Scalable single-cell analysis

SingleCellExperiment

  • Bioconductor infrastructure for single-cell data
  • Official: SingleCellExperiment
  • Data structure for scRNA-seq

scRNA-seq Quality Control

DoubletFinder

Scrublet

  • Python package for doublet detection
  • Official: Scrublet
  • Simulation-based doublet detection

miQC

scRNA-seq Integration & Batch Correction

Harmony

Seurat Integration

scVI (single-cell Variational Inference)

  • Deep learning for scRNA-seq analysis
  • Official: scVI
  • Neural network-based integration

Cell Type Annotation

SingleR

CellTypist

  • Machine learning-based cell type classifier
  • Official: CellTypist
  • Pre-trained models for annotation

Azimuth

  • Reference-based cell type mapping
  • Official: Azimuth
  • Web-based and R package

Quality Control & Utilities

Sequencing Quality Control

FastQC

  • Quality control tool for high throughput sequence data
  • Official: FastQC
  • First-pass quality assessment
  • Used across all NGS tutorials

MultiQC

  • Aggregate results from multiple bioinformatics analyses
  • Official: MultiQC
  • Comprehensive QC reporting
  • Used across all NGS tutorials

Trimmomatic

  • Flexible read trimming tool for Illumina data
  • Official: Trimmomatic
  • Adapter trimming and quality filtering

Cutadapt

  • Finds and removes adapter sequences
  • Official: Cutadapt
  • Flexible adapter removal

fastp

  • Ultra-fast FASTQ preprocessing
  • Official: fastp
  • All-in-one FASTQ processor

File Manipulation & Utilities

SAMtools

  • Suite for manipulating alignments in SAM/BAM format
  • Official: SAMtools
  • Essential for BAM file operations
  • Used across all NGS tutorials

BEDtools

  • Toolset for genome arithmetic
  • Official: BEDtools
  • Interval operations on genomic features
  • Used across multiple tutorials

BCFtools

  • Utilities for variant calling and manipulating VCF/BCF files
  • Official: BCFtools
  • VCF file manipulation

Picard

  • Java-based tools for manipulating HTS data
  • Official: Picard
  • Mark duplicates, collect metrics
  • Used in: WGS Part 1

IGV (Integrative Genomics Viewer)

  • Visualization tool for genomics data
  • Official: IGV
  • Interactive genome browser
  • Used for visualization across tutorials

UCSC Genome Browser

  • Web-based genome browser
  • Official: UCSC Browser
  • Explore genomic data

R Visualization Packages

ggplot2

pheatmap

  • Pretty heatmaps in R
  • Official: pheatmap
  • Heatmap generation
  • Used across RNA-seq tutorials

ComplexHeatmap

EnhancedVolcano


CRISPR Screen Analysis

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout)

CRISPResso2

  • Analysis of CRISPR editing outcomes
  • Official: CRISPResso2
  • Editing efficiency quantification

Databases & Data Resources

Gene Expression Databases

NCBI GEO (Gene Expression Omnibus)

TCGA (The Cancer Genome Atlas)

GTEx (Genotype-Tissue Expression)

ArrayExpress

  • Functional genomics data archive
  • Official: ArrayExpress
  • European alternative to GEO

recount3

  • Unified access to RNA-seq datasets
  • Official: recount3
  • Pre-processed RNA-seq data

GEPIA3 (https://gepia3.bioinfoliu.com/) – Gene Expression Profiling Interactive Analysis v3

  • Interactive web tool for analyzing RNA-seq data from TCGA and GTEx
  • Focused on cancer genomics and differential expression analysis

UCSC Xena (https://xena.ucsc.edu/) – UCSC Xena Browser

  • Multi-omic data visualization platform
  • Hosts TCGA, GTEx, and other large cancer genomics datasets

Reference Genomes & Annotations

GENCODE

  • High-quality gene annotations
  • Official: GENCODE
  • Human and mouse annotations
  • Used across all RNA-seq tutorials

Ensembl

  • Genome browser and annotation database
  • Official: Ensembl
  • Comprehensive genome annotations
  • Used across tutorials

UCSC Genome Browser

  • Reference genomes and annotations
  • Official: UCSC Downloads
  • Alternative genome references

RefSeq

  • NCBI Reference Sequence Database
  • Official: RefSeq
  • Curated gene annotations

refgenie

  • Reference genome manager
  • Official: refgenie
  • Pre-built genome references

Illumina iGenomes

  • Ready-to-use reference sequences and annotations
  • Official: iGenomes
  • Pre-indexed genomes

Variant & Population Databases

gnomAD (Genome Aggregation Database)

ClinVar

dbSNP

  • Short genetic variations
  • Official: dbSNP
  • Variant IDs and frequencies

1000 Genomes

  • Human genetic variation catalog
  • Official: 1000 Genomes
  • Population genetics reference

COSMIC (Catalogue of Somatic Mutations in Cancer)

  • Cancer somatic mutation database
  • Official: COSMIC
  • Cancer mutation catalog

Database of Genomic Variants (DGV)

  • Structural variation in healthy individuals
  • Official: DGV
  • Benign CNV reference

Clinical & Disease Databases

OMIM (Online Mendelian Inheritance in Man)

  • Human genes and genetic disorders
  • Official: OMIM
  • Gene-disease relationships

ClinGen

  • Clinical genome resource
  • Official: ClinGen
  • Gene-disease validity

DECIPHER

  • Database of genomic variation and phenotype
  • Official: DECIPHER
  • Rare disease genomics

GenCC (Gene Curation Coalition)

  • Curated gene-disease relationships
  • Official: GenCC
  • Standardized gene-disease assertions

HGMD (Human Gene Mutation Database)

  • Disease-causing mutations
  • Official: HGMD
  • Mutation catalog (subscription)

Orphanet

  • Rare disease and orphan drug portal
  • Official: Orphanet
  • Rare disease information

cBioPortal

  • Cancer genomics data visualization
  • Official: cBioPortal
  • Interactive cancer genomics
  • Used in mutation visualization

Pathway & Functional Databases

MSigDB (Molecular Signatures Database)

KEGG (Kyoto Encyclopedia of Genes and Genomes)

  • Pathway and disease databases
  • Official: KEGG
  • Metabolic and signaling pathways

Reactome

  • Pathway knowledge base
  • Official: Reactome
  • Curated pathway database

Gene Ontology (GO)

  • Gene function classification
  • Official: GO
  • Biological process, molecular function, cellular component

WikiPathways

  • Community-curated pathway database
  • Official: WikiPathways
  • Open-source pathways

Regulatory & Epigenetics Databases

ENCODE

  • Encyclopedia of DNA Elements
  • Official: ENCODE
  • Functional genomics data
  • ChIP-seq, ATAC-seq, RNA-seq datasets

TRRUST (Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining)

  • Human/mouse transcription factor-target interactions
  • Official: TRRUST
  • TF regulatory networks
  • Used in network analysis tutorials

RegNetwork

  • Regulatory network repository
  • Official: RegNetwork
  • TF-target gene relationships

JASPAR

  • Transcription factor binding profile database
  • Official: JASPAR
  • TF motifs

Cistrome DB

  • ChIP-seq and chromatin accessibility database
  • Official: Cistrome DB
  • Curated ChIP-seq data

miRNA Databases

miRBase

TargetScan

  • Predict miRNA target sites
  • Official: TargetScan
  • miRNA-mRNA interactions

miRDB

  • MicroRNA target prediction database
  • Official: miRDB
  • miRNA target genes

Single-Cell Reference Databases

Human Cell Atlas

  • Reference maps of human cells
  • Official: Human Cell Atlas
  • Single-cell reference data

PanglaoDB

  • Single-cell sequencing database
  • Official: PanglaoDB
  • Cell type markers

CellMarker

  • Cell marker database
  • Official: CellMarker
  • Manually curated cell markers

Computing & Environment

Package Management

Conda / Miniforge

Bioconda

Conda-forge

Programming Languages & IDEs

R

  • Statistical computing language
  • Official: R Project
  • Essential for bioinformatics analysis
  • Used across all tutorials

RStudio

  • Integrated development environment for R
  • Official: RStudio
  • User-friendly R interface

Python

  • General-purpose programming language
  • Official: Python
  • Versatile for bioinformatics

Jupyter

  • Interactive computing notebooks
  • Official: Jupyter
  • Python/R notebook interface

High Performance Computing (HPC)

Slurm

  • Workload manager for HPC clusters
  • Official: Slurm
  • Job submission and management
  • Used in: Slurm Tutorial

PBS/Torque

  • Alternative HPC job scheduler
  • Official: PBS Works
  • Cluster job management

SGE (Sun Grid Engine)

  • Distributed resource management
  • Alternative HPC scheduler

Containerization

Docker

  • Platform for developing and running applications in containers
  • Official: Docker
  • Reproducible analysis environments
  • Used in: Docker Tutorial

Singularity/Apptainer

  • Container platform for HPC
  • Official: Apptainer
  • HPC-friendly containerization

Workflow Management

Snakemake

  • Workflow management system for Python
  • Official: Snakemake
  • Reproducible and scalable analysis

Nextflow

  • Data-driven computational pipelines
  • Official: Nextflow
  • Portable workflow framework

WDL (Workflow Description Language)

  • Workflow specification language
  • Official: WDL
  • Used by GATK pipelines

Learning Resources

Cheat Sheets

Conda Cheat Sheet

R Cheat Sheets (Posit)

  • Comprehensive R package cheat sheets
  • Official: Posit Cheat Sheets
  • ggplot2, dplyr, tidyr, and more

R Cheat Sheets (Kaggle)

data.table Cheat Sheet

ggplot2 Cheat Sheet

Unix Command Line Cheat Sheet

  • Essential Linux/Unix commands
  • Various resources available online

Documentation Hubs

Bioconductor

  • R packages for genomic data analysis
  • Official: Bioconductor
  • 2000+ bioinformatics packages

Galaxy Project

  • Web-based platform for data analysis
  • Official: Galaxy
  • No-code bioinformatics

NGS 101 Complete Tutorial Library

  • Your current site – 70+ comprehensive tutorials
  • Home: NGS 101 Tutorials
  • Beginner-friendly, step-by-step guides

Quick Reference

File Formats Guide

FASTQ Format

BAM/SAM Format

VCF/BCF Format

BED Format

GTF/GFF Format

Complete File Format Reference

Data Management

HPC Data Management Guide

NCBI Database Guide


Additional Resources

Lei’s Other Educational Content

BullishBooks

  • Entrepreneurship and personal development
  • Website: BullishBooks.com
  • Focus: Building sustainable businesses

This resource page is continuously updated as new tools and tutorials are added. Last update: January 2026

Need help with a specific analysis? Check out our complete tutorial library.

Looking for a specific tool? Use Ctrl+F (or Cmd+F on Mac) to search this page.

Want to suggest a resource? Contact us through our collaborations page.


Tags:

RNA-seq analysis tools, ChIP-seq software, ATAC-seq analysis, DNA methylation tools, variant calling pipelines, single-cell RNA-seq, bioinformatics databases, genomics tools, NGS data analysis, DESeq2 tutorial, GATK variant calling, Seurat single-cell, WGCNA network analysis, pathway enrichment tools, cancer genomics databases, TCGA data access, GEO database, reference genomes, bioconductor packages, conda bioinformatics, HPC cluster computing, Docker containerization, variant annotation, copy number analysis, fusion gene detection, alternative splicing analysis, epigenetics tools, Hi-C analysis, CRISPR screen analysis, GWAS tools, population genetics databases, clinical variant interpretation, transcription factor databases, miRNA target prediction, cell type deconvolution, batch effect correction, quality control tools, genome visualization, mutation signature analysis.