How to Analyze Single-Cell RNA-seq Data – Complete Beginner’s Guide Part 7: Trajectory and Pseudotime Analysis Using Monocle 3

Table of Contents

Introduction: Understanding Cell State Transitions

What Is Trajectory Analysis and Why Do We Need It?

If you’ve completed Parts 1-6 of this tutorial series, you’ve successfully:

Processed raw sequencing data into count matrices (Part 1)
Quality-controlled your cells (Part 2)
Integrated samples and identified clusters (Part 3)
Assigned biological identities to clusters (Part 4)
Performed differential expression analysis (Part 5)
Mastered data object structures (Part 6)

But here’s a fundamental limitation of clustering: cells don’t exist in discrete categories.

The biological reality:

Stem cells gradually differentiate into specialized cell types
Immune cells continuously respond to stimuli, transitioning through activation states
Cancer cells evolve through progressive stages of transformation
Developmental processes unfold as smooth, continuous progressions

The computational challenge: Clustering algorithms force cells into discrete groups, creating artificial boundaries where none exist biologically. Two cells in different clusters might actually represent consecutive points along a continuous developmental pathway.

Trajectory analysis solves this by modeling cells as existing along continuous paths of gene expression change, revealing:

Differentiation pathways: How stem/progenitor cells become specialized
Temporal ordering: The sequence of gene expression changes during processes
Branch points: Where cells make fate decisions (e.g., becoming cell type A vs B)
Pseudotime: A measure of biological progression independent of real time

What Biological Questions Does Trajectory Analysis Answer?

Question 1: What is the differentiation path from stem cells to mature cells?

Example: In hematopoiesis (blood cell development), trajectory analysis can reveal:

Common myeloid progenitors → Monocytes → Macrophages/Dendritic cells
Branch points where cells commit to different lineages
Genes driving each developmental decision

Question 2: How do cells respond to treatment or stimuli over time?

Example: In our periodontitis dataset (GSE174609), trajectory analysis might show:

How immune cells transition from resting to activated states
Whether treatment reverses cells along activation trajectories
Genes that change early vs late in activation

Question 3: What is the temporal order of gene expression changes?

Example: During T cell activation:

Early genes: Immediate response transcription factors (FOS, JUN)
Middle genes: Cytokine receptors and signaling molecules
Late genes: Effector molecules and exhaustion markers

Question 4: Where do cells make fate decisions?

Example: At a branch point, cells might choose between:

Becoming cytotoxic T cells (high GZMB) vs regulatory T cells (high FOXP3)
Identifying the genes that determine this choice

When Should You Perform Trajectory Analysis?

✅ Ideal scenarios for trajectory analysis:

Developmental systems: Stem cell differentiation, organogenesis, tissue regeneration
Time-course experiments: Treatment response over hours/days, disease progression
Activation/response: Immune cell activation, stress responses, cell cycle
Continuous processes: Any biological process where cells transition smoothly between states

❌ When trajectory analysis may not be appropriate:

Stable, terminally differentiated cells: Mature neurons, hepatocytes (no transitions occurring)
Discrete cell types with no developmental relationship: Comparing liver cells to brain cells
Technical artifacts: Batch effects or quality gradients masquerading as biology
Insufficient cellular diversity: Need cells across the full trajectory, not just endpoints
Very rare cell types: <50 cells cannot reliably define trajectories (e.g., Platelets with 39 cells)

Critical requirement: Your dataset must contain intermediate states between start and end points. If you only have fully differentiated cells with no transitional forms, trajectory analysis cannot infer the path between them.

How Does Trajectory Analysis Work Conceptually?

Think of trajectory analysis as reverse-engineering a road map from GPS coordinates:

Input: Positions of many cars (cells) in gene expression space (UMAP/PCA)

Goal: Infer the underlying road network (differentiation paths)

Method:

Assume cells are at different points along a journey (early, middle, late in differentiation)
Connect nearby cells based on gene expression similarity
Find the most likely path that connects cells from start to end
Order cells along this path (pseudotime = position on the road)
Identify branch points where roads diverge (fate decisions)

Key insight: We don’t need to know which cells are “early” vs “late” in advance. The algorithm infers this from the data structure—cells form natural progressions in gene expression space.

What Is Monocle 3 and How Does It Infer Trajectories?

Monocle 3 is one of the most widely used trajectory inference tools, developed by Cole Trapnell’s lab. It works through several key steps:

Step 1: Principal Graph Learning

Instead of forcing cells onto predefined paths, Monocle 3 learns the trajectory structure directly from the data:

Constructs a “principal graph” that fits through the cloud of cells in reduced dimension space
Graph is flexible, allowing complex topologies (linear, branched, cyclic)
Captures the manifold structure of cell state transitions

Step 2: Partition Detection

Identifies partitions: Groups of cells connected by continuous trajectories
Cells in different partitions have no inferred connection (represent separate biological processes)
Example: Lymphoid and myeloid lineages might be separate partitions

Step 3: Pseudotime Calculation

Assigns each cell a pseudotime value representing its position along the trajectory
Pseudotime = 0 at the start (root) of the trajectory
Increases as cells progress toward differentiated states
Note: Pseudotime is relative, not absolute time

Step 4: Branch Assignment

Identifies branch points where the trajectory splits
Assigns cells to specific branches
Enables analysis of genes specific to each fate choice

Why Monocle 3 for this tutorial?

Flexible: Handles simple linear and complex branched trajectories
Well-integrated: Works seamlessly with Seurat objects
Interpretable: Clear visualization of trajectories and pseudotime
Widely adopted: Extensive documentation and community support

Important conceptual note: Trajectory analysis makes an assumption—that the developmental process you’re studying is actually occurring in your data. If you have only mature cell types with no intermediate states, the algorithm may still create a trajectory, but it won’t be biologically meaningful. Always validate that the inferred trajectory makes biological sense!

What You’ll Learn in This Tutorial

By the end of this tutorial, you’ll be able to:

✅ Understand when trajectory analysis is appropriate for your data
✅ Install and configure Monocle 3 correctly (avoiding common dependency issues)
✅ Convert Seurat objects to Monocle 3 format while preserving metadata
✅ Run complete trajectory inference workflows step-by-step
✅ Handle multiple partitions representing disconnected biological processes
✅ Use both root selection methods (manual and automatic) programmatically
✅ Visualize trajectories and pseudotime on your existing UMAP coordinates
✅ Identify genes that change along developmental paths
✅ Interpret branch points and fate decisions
✅ Troubleshoot common Monocle 3 errors and issues
✅ Integrate trajectory results back into your Seurat workflow

Special features of this tutorial:

HPC-compatible: Fully non-interactive code suitable for batch jobs and servers without graphical displays
Reproducible: Programmatic root selection instead of interactive clicking
Real-world data: Handles multiple disconnected partitions (myeloid, lymphoid lineages)
Dual methodology: Demonstrates both manual (biologically informed) and automatic (data-driven) root selection

Let’s begin by setting up the necessary software environment.

Setting Up Your R Environment for Trajectory Analysis

Installing Monocle 3 and Dependencies

Monocle 3 has specific dependencies that can be challenging to install. We’ll use a systematic approach that handles these properly.

Why is Monocle 3 installation tricky?

Requires specific Bioconductor packages with version compatibility
Depends on archived CRAN package (grr)
Needs renv package for GitHub installation
Some dependencies (terra, ggrastr) have system-level requirements

The solution: Follow the exact installation sequence below to avoid conflicts.

#-----------------------------------------------
# Installation: Monocle 3 and dependencies
#-----------------------------------------------

# Set CRAN mirror
options(repos = c(CRAN = "https://cloud.r-project.org"))

# Install BiocManager for Bioconductor dependencies
if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

# Install Bioconductor dependencies for Monocle 3
BiocManager::install(c(
  "BiocGenerics",
  "DelayedArray",
  "DelayedMatrixStats",
  "limma",
  "lme4",
  "S4Vectors",
  "SingleCellExperiment",
  "SummarizedExperiment",
  "batchelor",
  "Matrix.utils",
  "HDF5Array",
  "terra",
  "ggrastr"
))

# Install CRAN packages
install.packages(c(
  "Seurat",         # Already installed from previous parts
  "SeuratObject",   # Seurat object infrastructure
  "dplyr",          # Data manipulation
  "ggplot2",        # Visualization
  "patchwork",      # Combine plots
  "RColorBrewer",   # Color palettes
  "viridis",        # Perceptually uniform colors
  "Matrix",         # Sparse matrix operations
  "renv",           # Required for Monocle 3 installation
  "remotes"         # Required for SeuratWrappers installation
))

# Install grr (archived CRAN package required by Monocle 3)
install.packages("https://cran.r-project.org/src/contrib/Archive/grr/grr_0.9.5.tar.gz", 
                 repos = NULL, type = "source")

# Install Monocle 3 from GitHub
renv::install('cole-trapnell-lab/monocle3')

# Install SeuratWrappers from GitHub (for Seurat <-> Monocle conversion)
remotes::install_github('satijalab/seurat-wrappers')

Loading Required Libraries

#-----------------------------------------------
# STEP 1: Load required libraries
#-----------------------------------------------

# Core single-cell analysis
library(Seurat)
library(SeuratObject)
library(SeuratWrappers)

# Trajectory inference
library(monocle3)

# Data manipulation and visualization
library(dplyr)
library(ggplot2)
library(patchwork)
library(RColorBrewer)
library(viridis)

# Set working directory
setwd("~/GSE174609_scRNA/trajectory_analysis")

# Create output directories
dir.create("plots", showWarnings = FALSE)
dir.create("plots/trajectories", showWarnings = FALSE)
dir.create("plots/pseudotime", showWarnings = FALSE)
dir.create("plots/genes_over_pseudotime", showWarnings = FALSE)
dir.create("results", showWarnings = FALSE)

# Set random seed for reproducibility
set.seed(42)

# Configure plotting defaults
theme_set(theme_classic(base_size = 12))

What we’ve done:

✅ Loaded Seurat for data management and visualization
✅ Loaded monocle3 for trajectory inference
✅ Loaded visualization tools for publication-quality plots
✅ Created organized directory structure for outputs
✅ Set reproducible random seed (important for consistent results)

Preparing Data for Monocle 3 Analysis

Understanding the Monocle 3 Workflow

Before diving into code, let’s understand the complete Monocle 3 workflow:

Step 1: Load annotated Seurat object (from Part 4)
   ↓
Step 2: Subset and downsample cells (computational efficiency for demonstration purpose)
   ↓
Step 3: Convert Seurat → Monocle 3 cell_data_set (CDS) object
   ↓
Step 4: Preprocess data (PCA dimensionality reduction)
   ↓
Step 5: Align to original UMAP (maintain consistency with Seurat)
   ↓
Step 6: Learn trajectory graph structure
   ↓
Step 7: Order cells in pseudotime
   ↓
Step 8: Visualize trajectories and pseudotime
   ↓
Step 9: Identify trajectory-associated genes
   ↓
Step 10: Integrate results back to Seurat object

Loading the Annotated Seurat Object

#-----------------------------------------------
# STEP 2: Load annotated Seurat object from Part 4
#-----------------------------------------------

# Load the complete annotated object
seurat_full <- readRDS("~/GSE174609_scRNA/cell_type_annotation/annotations/integrated_annotated_seurat.rds")

# Verify data structure
cat("Original dataset:\n")
cat("  Total cells:", ncol(seurat_full), "\n")
cat("  Total genes:", nrow(seurat_full), "\n")
cat("  Samples:", length(unique(seurat_full$sample_id)), "\n")
cat("  Cell types:", length(unique(seurat_full$final_annotation)), "\n\n")

# Check cell type distribution
cat("Cell type distribution (full dataset):\n")
cell_type_counts <- table(seurat_full$final_annotation)
print(cell_type_counts)
cat("\n")

Expected output:

Original dataset:
  Total cells: 72649 
  Total genes: 18861 
  Samples: 8 
  Cell types: 8 

Cell type distribution (full dataset):

            B cells        CD4+ T cells        CD8+ T cells Classical Monocytes 
               6735               31379                3132                5218 
          Monocytes             NK cells           Platelets             T cells 
               2352               16067                  39                7727

Understanding the data:

7 cell types: Representing major immune populations in PBMCs (Platelets excluded)
72,649 cells: Large dataset suitable for trajectory analysis
Uneven distribution: T cells are most abundant (~55%), Monocytes rarest (~3%)
Source: Integrated, batch-corrected data with consensus annotations
Platelets excluded: Only 39 cells total (too rare for meaningful trajectory analysis)

Strategic Downsampling: Maintaining Biological Diversity

Why downsample all cell types?

In trajectory analysis, we want to:

Capture transitions across different cell types (if they exist)
Maintain representation of rare cell types (dendritic cells, plasma cells)
Balance computational efficiency with biological coverage

Strategy: Take 10% of cells from each cell type separately, ensuring:

Rare cell types aren’t lost (at least ~165 dendritic cells retained)
Cell type proportions are preserved
Computational time is reduced ~10-fold

#-----------------------------------------------
# STEP 3: Downsample to 10% of cells per cell type (exclude Platelets)
#-----------------------------------------------

# Set seed for reproducible downsampling
set.seed(42)

# Get unique cell types and exclude Platelets (too rare for trajectory analysis)
all_cell_types <- unique(seurat_full$final_annotation)
cell_types <- all_cell_types[all_cell_types != "Platelets"]

cat("Cell types to analyze (excluding Platelets):\n")
print(cell_types)
cat("\n")

# Sample 10% from each cell type
downsampled_cells <- c()

for (cell_type in cell_types) {
  # Get cells of this type
  cells_of_type <- colnames(seurat_full)[seurat_full$final_annotation == cell_type]

  # Calculate 10% (minimum 50 cells for rare types)
  n_sample <- max(50, round(length(cells_of_type) * 0.1))

  # Don't sample more cells than exist
  n_sample <- min(n_sample, length(cells_of_type))

  # Random sample
  sampled <- sample(cells_of_type, n_sample)
  downsampled_cells <- c(downsampled_cells, sampled)

  cat(sprintf("  %s: %d → %d cells (%.1f%%)\n", 
              cell_type, length(cells_of_type), n_sample, 
              100 * n_sample / length(cells_of_type)))
}

# Create downsampled Seurat object
seurat_subset <- subset(seurat_full, cells = downsampled_cells)

cat("\n")
cat("Downsampled dataset:\n")
cat("  Total cells:", ncol(seurat_subset), "\n")
cat("  Cell types retained:", length(unique(seurat_subset$final_annotation)), "\n")
cat("  Reduction from full dataset:", 
    round(100 * (1 - ncol(seurat_subset)/ncol(seurat_full)), 1), "%\n")

# Verify cell type representation
cat("\nCell type distribution (downsampled):\n")
print(table(seurat_subset$final_annotation))

Expected output:

Cell types to analyze (excluding Platelets):
[1] "B cells"             "CD4+ T cells"        "CD8+ T cells"        
[4] "Classical Monocytes" "Monocytes"           "NK cells"            
[7] "T cells"            

  B cells: 6735 → 674 cells (10.0%)
  CD4+ T cells: 31379 → 3138 cells (10.0%)
  CD8+ T cells: 3132 → 313 cells (10.0%)
  Classical Monocytes: 5218 → 522 cells (10.0%)
  Monocytes: 2352 → 235 cells (10.0%)
  NK cells: 16067 → 1607 cells (10.0%)
  T cells: 7727 → 773 cells (10.0%)

Downsampled dataset:
  Total cells: 7262 
  Cell types retained: 7 
  Reduction from full dataset: 90.0%

Cell type distribution (downsampled):
            B cells        CD4+ T cells        CD8+ T cells Classical Monocytes 
                674                3138                 313                 522 
          Monocytes             NK cells             T cells 
                235                1607                 773

Why this sampling strategy works:

✅ Proportional representation: Each cell type maintains its relative abundance
✅ Rare types preserved: Even Monocytes (3% of data) have 235 cells
✅ Computational feasibility: 7,262 cells processable in minutes vs hours
✅ Reproducible: Fixed seed ensures consistent results

Important note: For publication-quality analysis, you would typically run on the full dataset after validating parameters on the downsampled version. This tutorial uses the downsampled data throughout for pedagogical clarity and faster execution.

Visualizing the Downsampled Dataset

Before trajectory analysis, let’s verify the downsampled data maintains the biological structure:

#-----------------------------------------------
# STEP 4: Visualize downsampled data
#-----------------------------------------------

# Check which UMAP is available (from Part 3 integration)
available_reductions <- names(seurat_subset@reductions)
if ("umap.harmony" %in% available_reductions) {
  umap_reduction <- "umap.harmony"
} else if ("umap.cca" %in% available_reductions) {
  umap_reduction <- "umap.cca"
} else if ("umap.rpca" %in% available_reductions) {
  umap_reduction <- "umap.rpca"
} else if ("umap.mnn" %in% available_reductions) {
  umap_reduction <- "umap.mnn"
} else {
  umap_reduction <- "umap"
}

cat("Using UMAP reduction:", umap_reduction, "\n\n")

# Create multi-panel overview
p1 <- DimPlot(seurat_subset, reduction = umap_reduction,
              group.by = "final_annotation", label = TRUE, 
              label.size = 3.5, pt.size = 0.8, repel = TRUE) +
  ggtitle("Cell Types (Downsampled Data)") +
  theme(plot.title = element_text(face = "bold", size = 14),
        legend.position = "right")

p2 <- DimPlot(seurat_subset, reduction = umap_reduction,
              group.by = "condition", pt.size = 0.8) +
  ggtitle("Condition Distribution") +
  scale_color_manual(values = c("Healthy" = "#2E86AB", 
                                 "Post_Treatment" = "#F18F01")) +
  theme(plot.title = element_text(face = "bold", size = 14))

p3 <- DimPlot(seurat_subset, reduction = umap_reduction,
              group.by = "sample_id", pt.size = 0.8) +
  ggtitle("Sample Distribution") +
  theme(plot.title = element_text(face = "bold", size = 14),
        legend.text = element_text(size = 8))

# Combine plots
p_overview <- (p1 | p2) / (p3 | plot_spacer())

ggsave("plots/01_downsampled_data_overview.png", p_overview,
       width = 16, height = 12, dpi = 300)

cat("Saved: plots/01_downsampled_data_overview.png\n")

What to look for in this plot:

Top left (Cell Types): All 7 cell types should be visible and spatially separated
Top right (Condition): Healthy and Post-Treatment should be well-mixed (good integration)
Bottom left (Samples): All 8 samples should be represented

Critical validation: If cell types aren’t spatially separated or samples show strong batch effects, go back to Part 3 (Integration) before proceeding with trajectory analysis. Trajectories learned from batch-driven gradients are artifacts, not biology!

Trajectory Inference with Monocle 3: Step-by-Step Workflow

Converting Seurat Object to Monocle 3 CDS Format

Monocle 3 uses a different data structure called cell_data_set (CDS). We’ll use SeuratWrappers for easy conversion that automatically preserves:

Expression data (counts, normalized data)
Cell metadata (cell types, conditions, samples)
Gene metadata
Dimensionality reductions (PCA, UMAP)

#-----------------------------------------------
# STEP 5: Convert Seurat to Monocle 3 CDS object using SeuratWrappers
#-----------------------------------------------

# Convert Seurat object to cell_data_set
# This automatically transfers counts, metadata, and reductions
cds <- as.cell_data_set(seurat_subset)

cat("Created Monocle 3 cell_data_set using SeuratWrappers:\n")
cat("  Cells:", ncol(cds), "\n")
cat("  Genes:", nrow(cds), "\n")
cat("  Cell metadata columns:", ncol(colData(cds)), "\n\n")

# Monocle 3's plot_cells() requires this for gene plotting
# Add gene names to rowData
rowData(cds)$gene_short_name <- rownames(cds)

cat("Fixed gene names for Monocle 3 compatibility\n")
cat("  Gene names set:", "gene_short_name" %in% colnames(rowData(cds)), "\n")
cat("  First 5 genes:", head(rowData(cds)$gene_short_name, 5), "\n\n")

# Verify what was transferred
cat("Verifying transferred information:\n")
cat("  Expression data: counts layer =", 
    !is.null(counts(cds)), "\n")
cat("  Cell metadata: final_annotation =", 
    "final_annotation" %in% colnames(colData(cds)), "\n")
cat("  Gene names: gene_short_name =", 
    "gene_short_name" %in% colnames(rowData(cds)), "\n")

Expected output:

Created Monocle 3 cell_data_set using SeuratWrappers:
  Cells: 7262 
  Genes: 18861 
  Cell metadata columns: 49

Fixed gene names for Monocle 3 compatibility
  Gene names set: TRUE 
  First 5 genes: MIR1302-2HG FAM138A OR4F5 AL627309.1 AL627309.3 

Verifying transferred information:
  Expression data: counts layer = TRUE 
  Cell metadata: final_annotation = TRUE 
  Gene names: gene_short_name = TRUE

Understanding the CDS object: CDS is based on SingleCellExperiment (see Part 6 for object structure details)

# The CDS object structure (from SingleCellExperiment)
# - assays: Expression matrices (counts, logcounts, etc.)
# - colData: Cell metadata (cell types, conditions, etc.)
# - rowData: Gene metadata (gene names, IDs, etc.)
# - reducedDims: Dimensionality reductions (PCA, UMAP, etc.)

Alternative: Manual conversion (for reference)

If you need more control or SeuratWrappers isn’t available, you can build CDS manually:

# Manual approach (optional - not used in this tutorial)
counts_matrix <- GetAssayData(seurat_subset, layer = "counts")
cell_metadata <- seurat_subset@meta.data
gene_metadata <- data.frame(
  gene_short_name = rownames(seurat_subset),  # ← Explicitly sets gene_short_name
  row.names = rownames(seurat_subset)
)

cds_manual <- new_cell_data_set(
  expression_data = counts_matrix,
  cell_metadata = cell_metadata,
  gene_metadata = gene_metadata
)

Preprocessing: Normalization and Dimensionality Reduction

Monocle 3 needs to perform its own preprocessing to prepare for trajectory learning:

#-----------------------------------------------
# STEP 6: Preprocess data (normalization and PCA)
#-----------------------------------------------

# Normalize and reduce dimensions
cds <- preprocess_cds(
  cds,
  num_dim = 50,              # Number of PCA dimensions (same as Seurat)
  norm_method = "log",       # Log-normalization (matching Seurat)
  use_genes = NULL,          # Use all genes (can specify variable features)
  scaling = TRUE,            # Scale genes (mean = 0, variance = 1)
  verbose = FALSE
)

What preprocess_cds does:

Size factor normalization: Accounts for library size differences between cells
Log transformation: log(count + 1) to stabilize variance
Scaling: Centers and scales each gene (important for PCA)
PCA: Projects data into 50 dimensions capturing major variation

Parameter notes:

num_dim = 50: Match the number used in Seurat (Parts 2-3)
norm_method = "log": Standard for scRNA-seq; alternatives include “size_only”
use_genes = NULL: Uses all genes; can specify variable features for efficiency

Aligning to Original Seurat UMAP Coordinates

Critical decision: Where should we visualize trajectories?

Option 1: Let Monocle 3 create its own UMAP

❌ Results in different coordinates than all previous analyses
❌ Difficult to compare with Part 3-4 results
❌ Confusing for interpretation

Option 2: Use the original UMAP from Seurat (recommended)

✅ Maintains consistency with all previous visualizations
✅ Easy to compare trajectories with cell type annotations
✅ Leverages high-quality integration from Part 3

We’ll use Option 2: Ensure Seurat UMAP coordinates are in the CDS object.

#-----------------------------------------------
# STEP 7: Verify/transfer original UMAP coordinates to CDS
#-----------------------------------------------

# Check if SeuratWrappers already transferred UMAP
existing_reductions <- names(reducedDims(cds))
cat("Reductions in CDS after SeuratWrappers conversion:\n")
print(existing_reductions)
cat("\n")

# SeuratWrappers should have transferred UMAP, but let's verify and ensure
# it's named correctly for Monocle 3

# Get the UMAP from Seurat object (our original high-quality UMAP)
umap_coords <- seurat_subset@reductions[[umap_reduction]]@cell.embeddings

# Verify dimensions match
cat("UMAP coordinates dimension:", dim(umap_coords), "\n")
cat("Expected (cells, 2):", c(ncol(cds), 2), "\n\n")

# Add/overwrite UMAP in CDS object
# Monocle 3 expects reducedDims to be named "UMAP" for visualization
reducedDims(cds)[["UMAP"]] <- umap_coords

cat("UMAP coordinates set in CDS object\n")
cat("Available reductions:", names(reducedDims(cds)), "\n")

Expected output:

Reductions in CDS after SeuratWrappers conversion:
[1] "PCA" "INTEGRATED.CCA" "UMAP.CCA"

UMAP coordinates dimension: 7262 2 
Expected (cells, 2): 7262 2 

UMAP coordinates set in CDS object
Available reductions: PCA INTEGRATED.CCA UMAP.CCA UMAP

Why this matters:

Trajectories will be overlaid on the same UMAP you’ve been using since Part 3
Cell types, clusters, and conditions maintain their familiar positions
Enables direct comparison: “Does the trajectory follow the T cell activation gradient I saw before?”

Technical note:

SeuratWrappers typically transfers PCA and UMAP automatically
We explicitly set UMAP to ensure it’s the correct reduction and properly named
The UMAP reduction must be named exactly “UMAP” for plot_cells() to use it automatically

What if SeuratWrappers didn’t transfer UMAP?

If existing_reductions is empty or doesn’t include UMAP, the code above adds it manually from the Seurat object. This ensures compatibility regardless of SeuratWrappers version.

Learning the Trajectory Graph Structure

Now comes the core of trajectory inference: learning the principal graph that represents developmental paths.

What is the principal graph?

A tree-like or graph structure that fits through the cloud of cells
Represents the “skeleton” of the developmental process
Cells are then projected onto this graph to determine their position

Monocle 3’s graph learning algorithm:

Identifies nearest neighbors in reduced dimension space
Constructs a graph connecting similar cells
Partitions the graph into connected components
Learns a principal curve/tree within each partition
Simplifies to a minimal spanning tree

Important: Before learning the graph, Monocle 3 requires clustering cells to identify partitions.

#-----------------------------------------------
# STEP 8: Cluster cells and learn principal graph
#-----------------------------------------------

# First, cluster cells (required before learn_graph)
# This identifies partitions and communities in the data
cds <- cluster_cells(
  cds,
  reduction_method = "UMAP",
  k = 20,                      # Number of nearest neighbors
  cluster_method = "louvain",  # Clustering algorithm
  num_iter = 2,                # Iterations for refinement
  partition_qval = 0.05,       # Significance threshold for partitions
  verbose = FALSE
)

cat("Number of partitions:", length(unique(partitions(cds))), "\n")
cat("Number of clusters:", length(unique(clusters(cds))), "\n\n")

# Now learn the trajectory graph structure
cds <- learn_graph(
  cds,
  use_partition = TRUE,         # Learn separate graphs per partition
  close_loop = FALSE,            # Don't force circular trajectories
  learn_graph_control = list(
    ncenter = 500,               # Number of centers for graph construction
    nn.k = 15,                   # k-nearest neighbors
    prune_graph = TRUE           # Remove spurious connections
  ),
  verbose = FALSE
)

Expected output:

Number of partitions: 3 
Number of clusters: 26

Understanding parameters:

cluster_cells parameters:

reduction_method = "UMAP": Use UMAP coordinates for clustering
k = 20: Number of nearest neighbors for graph construction
cluster_method = "louvain": Algorithm for community detection
partition_qval = 0.05: Significance for splitting into partitions (lower = more partitions)

learn_graph parameters:

use_partition = TRUE: Cells in different partitions won’t be connected
close_loop = FALSE: For linear/branched trajectories (set TRUE for cell cycle)
ncenter = 500: Higher = more detailed graph (but slower and potentially overfitted)
nn.k = 15: Fewer neighbors = less connected graph; more = smoother trajectories

What are partitions?

Partitions separate disconnected cellular processes
Example: Lymphoid and myeloid lineages might be different partitions
Cells within a partition are connected by continuous transitions
Cells in different partitions have no inferred developmental relationship

In our PBMC data: Typically 1-3 partitions, depending on whether immune lineages are connected through common progenitors or separate.

Handling Multiple Partitions

Critical insight: Your data likely has multiple partitions representing disconnected biological processes. Let’s analyze all partitions to capture complete biology.

#-----------------------------------------------
# STEP 9: Analyze all partitions
#-----------------------------------------------

# Count cells per partition
partition_counts <- table(partitions(cds))
cat("Cells per partition:\n")
print(partition_counts)

# Identify which cell types are in each partition
cat("\nCell type distribution by partition:\n")
for (i in 1:length(partition_counts)) {
  partition_cells <- colnames(cds)[partitions(cds) == i]
  cell_types_in_partition <- table(colData(cds)$final_annotation[partitions(cds) == i])
  cat("\nPartition", i, "(", partition_counts[i], "cells ):\n")
  print(cell_types_in_partition)
}

# Keep all partitions for complete analysis
cds_main <- cds

cat("\nRetaining all partitions for analysis:\n")
cat("  Total cells:", ncol(cds_main), "\n")
cat("  Total partitions:", length(unique(partitions(cds_main))), "\n")
cat("  Genes:", nrow(cds_main), "\n")

Expected output (multiple partitions):

Cells per partition:

   1    2    3 
578 5989  695 

Cell type distribution by partition:

Partition 1 ( 578 cells ):

Classical Monocytes           Monocytes 
                522                  56 

Partition 2 ( 5989 cells ):
CD4+ T cells CD8+ T cells    Monocytes     NK cells      T cells 
        3138          313          158         1607          773

Partition 3 ( 695 cells ):
  B cells Monocytes 
      674        21

Retaining all partitions for analysis:
  Total cells: 7262 
  Total partitions: 3 
  Genes: 18861

Understanding partitions in this data:

Partition 1 (Classical Monocytes + Monocytes): Myeloid lineage trajectory (578 cells)
Partition 2 (T cells, NK cells, CD4+/CD8+ T cells): Main adaptive immune trajectory (6,040 cells)
Partition 3 (B cells + few Monocytes): B lymphoid lineage (695 cells)

Why multiple partitions?

Different hematopoietic lineages (myeloid vs T/NK vs B lymphoid)
No intermediate cells connecting them in this peripheral blood dataset
Each represents a separate biological trajectory
Monocytes appear in all partitions – may reflect:
- Biological heterogeneity (different monocyte subtypes)
- Transitional states
- Small number of doublets or contamination

Strategy: We’ll analyze all partitions to capture complete cellular diversity, demonstrating both manual and automatic root selection methods.

Understanding Pseudotime and Root Selection

What is pseudotime?

Pseudotime is a number assigned to each cell representing its position along a developmental path. Pseudotime = 0 marks the root (the starting cell state), and higher values represent more differentiated or activated states. It is not real time — it is an inferred biological progression.

Each partition has its own independent pseudotime. Values are not comparable across partitions.

Choosing a root means telling Monocle 3 where the trajectory begins:

Manual root: You pick a biologically meaningful starting cell type. In our data, CD4+ T cells include naive/resting populations, making them a good starting point for the T/NK cell trajectory (Partition 2).
Automatic root: You provide a small set of candidate cells and let Monocle 3 decide. Useful when the developmental hierarchy is unclear, as with the Classical Monocyte and B cell partitions.

We have 3 partitions, so we run order_cells() three times — once per partition.

Step 10a: Order Partition 2 (Manual Root — CD4+ T Cells)

Partition 2 is the largest partition (6,040 cells) containing CD4+ T cells, CD8+ T cells, T cells, NK cells, and Monocytes. We use CD4+ T cells as the root because they include naive populations that represent the earliest state in this trajectory.

#-----------------------------------------------
# STEP 10a: Order Partition 2 - manual root (CD4+ T cells)
#-----------------------------------------------

# Get all cells in Partition 2
cells_p2 <- colnames(cds_main)[partitions(cds_main) == 2]

# Select CD4+ T cells as root (biologically informed starting point)
root_cells_p2 <- cells_p2[
  colData(cds_main)$final_annotation[partitions(cds_main) == 2] == "CD4+ T cells"
]

cat("Partition 2 root cells (CD4+ T cells):", length(root_cells_p2), "\n")

# Order cells using CD4+ T cells as root
cds_main <- order_cells(
  cds_main,
  root_cells = root_cells_p2,
  verbose = FALSE
)

cat("Partition 2 pseudotime range:",
    round(range(pseudotime(cds_main)[partitions(cds_main) == 2], na.rm = TRUE), 2), "\n")

Expected output:

Partition 2 root cells (CD4+ T cells): 3138
Partition 2 pseudotime range: 0 23.66

Step 10b: Order Partition 1 (Automatic Root — Classical Monocytes)

Partition 1 contains Classical Monocytes and Monocytes (578 cells). The developmental hierarchy is not well established for this isolated myeloid cluster, so we provide 50 candidate cells and let Monocle 3 choose the best root.

#-----------------------------------------------
# STEP 10b: Order Partition 1 - automatic root (Classical Monocytes)
#-----------------------------------------------

# Get all cells in Partition 1
cells_p1 <- colnames(cds_main)[partitions(cds_main) == 1]

# Provide first 50 cells as candidates - Monocle 3 picks the best starting point
root_cells_p1 <- cells_p1[1:min(50, length(cells_p1))]

cat("Partition 1 root candidates provided:", length(root_cells_p1), "\n")

# Order cells
cds_main <- order_cells(
  cds_main,
  root_cells = root_cells_p1,
  verbose = FALSE
)

cat("Partition 1 pseudotime range:",
    round(range(pseudotime(cds_main)[partitions(cds_main) == 1], na.rm = TRUE), 2), "\n")

Expected output:

Partition 1 root candidates provided: 50
Partition 1 pseudotime range: 0 1.2

Step 10c: Order Partition 3 (Automatic Root — B Cells)

Partition 3 contains B cells and a few Monocytes (695 cells). We use the same automatic approach as Partition 1.

#-----------------------------------------------
# STEP 10c: Order Partition 3 - automatic root (B cells)
#-----------------------------------------------

# Get all cells in Partition 3
cells_p3 <- colnames(cds_main)[partitions(cds_main) == 3]

# Provide first 50 cells as candidates
root_cells_p3 <- cells_p3[1:min(50, length(cells_p3))]

cat("Partition 3 root candidates provided:", length(root_cells_p3), "\n")

# Order cells
cds_main <- order_cells(
  cds_main,
  root_cells = root_cells_p3,
  verbose = FALSE
)

cat("Partition 3 pseudotime range:",
    round(range(pseudotime(cds_main)[partitions(cds_main) == 3], na.rm = TRUE), 2), "\n")

Expected output:

Partition 3 root candidates provided: 50
Partition 3 pseudotime range: 0 1.07

Verify All Partitions Are Ordered

#-----------------------------------------------
# Verify pseudotime across all partitions
#-----------------------------------------------

cat("All cells ordered in pseudotime\n")
cat("Cells with NA pseudotime:", sum(is.na(pseudotime(cds_main))), "\n\n")

cat("Summary by partition:\n")
for (i in 1:3) {
  pt_range <- round(range(pseudotime(cds_main)[partitions(cds_main) == i], na.rm = TRUE), 2)
  n <- sum(partitions(cds_main) == i)
  cat("  Partition", i, "(", n, "cells ): pseudotime", pt_range[1], "to", pt_range[2], "\n")
}

Expected output:

All cells ordered in pseudotime
Cells with NA pseudotime: 0

Summary by partition:
  Partition 1 ( 578 cells ): pseudotime 0 to 1.2
  Partition 2 ( 6040 cells ): pseudotime 0 to 23.66
  Partition 3 ( 695 cells ): pseudotime 0 to 1.07

Why are Partitions 1 and 3 pseudotime ranges so narrow?

Partitions 1 (Classical Monocytes) and 3 (B cells) contain primarily one homogeneous cell type, resulting in a short, simple trajectory. Partition 2 spans five distinct cell types across a wide activation spectrum (naive CD4+ T cells → activated NK cells), producing a much longer trajectory.

HPC note: Always provide root_cells explicitly in order_cells(). Calling it without this parameter launches an interactive browser interface that will fail on servers without a graphical display.

Visualizing Trajectories Across All Partitions

Now let’s visualize what we’ve learned across all partitions:

#-----------------------------------------------
# STEP 11: Visualize trajectories and pseudotime
#-----------------------------------------------

# Plot 1: Trajectory colored by cell type
p_trajectory_celltype <- plot_cells(
  cds_main,
  color_cells_by = "final_annotation",
  label_cell_groups = TRUE,
  label_groups_by_cluster = FALSE,
  label_leaves = FALSE,
  label_branch_points = FALSE,
  graph_label_size = 3,
  cell_size = 1
) +
  ggtitle("Trajectory Structure (Colored by Cell Type)") +
  theme(plot.title = element_text(face = "bold", size = 14),
        legend.text = element_text(size = 9))

# Plot 2: Trajectory colored by pseudotime
p_trajectory_pseudotime <- plot_cells(
  cds_main,
  color_cells_by = "pseudotime",
  label_cell_groups = FALSE,
  label_leaves = FALSE,
  label_branch_points = TRUE,
  label_roots = TRUE,
  graph_label_size = 4,
  cell_size = 1
) +
  scale_color_viridis(option = "plasma", name = "Pseudotime") +
  ggtitle("Trajectory Colored by Pseudotime") +
  theme(plot.title = element_text(face = "bold", size = 14))

# Plot 3: Trajectory colored by condition
p_trajectory_condition <- plot_cells(
  cds_main,
  color_cells_by = "condition",
  label_cell_groups = FALSE,
  label_leaves = FALSE,
  label_branch_points = FALSE,
  cell_size = 1
) +
  scale_color_manual(values = c("Healthy" = "#2E86AB", 
                                 "Post_Treatment" = "#F18F01")) +
  ggtitle("Trajectory Colored by Condition") +
  theme(plot.title = element_text(face = "bold", size = 14))

# Plot 4: Trajectory colored by partition (if multiple exist)
n_partitions <- length(unique(partitions(cds_main)))

if (n_partitions > 1) {
  p_trajectory_partition <- plot_cells(
    cds_main,
    color_cells_by = "partition",
    label_cell_groups = TRUE,
    label_groups_by_cluster = FALSE,
    label_leaves = FALSE,
    label_branch_points = FALSE,
    cell_size = 1
  ) +
    ggtitle("Trajectory Colored by Partition") +
    theme(plot.title = element_text(face = "bold", size = 14))

  # Combine plots (4-panel)
  p_trajectory_overview <- (p_trajectory_celltype | p_trajectory_pseudotime) / 
                           (p_trajectory_condition | p_trajectory_partition)
} else {
  # Combine plots (3-panel)
  p_trajectory_overview <- (p_trajectory_celltype | p_trajectory_pseudotime) / 
                           (p_trajectory_condition | plot_spacer())
}

ggsave("plots/trajectories/02_trajectory_overview.png", 
       p_trajectory_overview, width = 16, height = 12, dpi = 300)

What to look for in these plots:

Plot 1 (Cell Type):

Main partition (center): T cells, NK cells, Monocytes with smooth transitions
Separate clusters (corners): Classical Monocytes and B cells disconnected
Black lines show trajectory paths within each partition
Do different cell types occupy different regions of their respective trajectories?

Plot 2 (Pseudotime):

Color gradient per partition: Each partition has independent pseudotime (0 to max)
Main partition: Purple (low) → Yellow (high) shows T cell progression
Classical Monocytes: Separate pseudotime scale (automatic root)
B cells: Separate pseudotime scale (automatic root)
Note: Pseudotime values NOT comparable across partitions!

Plot 3 (Condition):

Are Healthy and Post-Treatment cells evenly distributed within each partition?
Does treatment preferentially drive cells along certain trajectories?
Look for condition-specific enrichment at high vs low pseudotime

Plot 4 (Partition) – if multiple partitions:

Clearly shows which cells belong to which disconnected trajectory
Partition 2: Largest, contains main immune cell trajectory
Partitions 1, 3: Separate lineages (Classical Monocytes, B cells)
Confirms biological separation vs technical artifacts

Trajectory features visible:

Black lines: Principal graphs (inferred developmental paths)
Branch points: Where trajectories split (fate decisions) – mainly in Partition 1
Leaves: Terminal states (differentiated cell types)
Roots: Starting points (marked with circles if label_roots = TRUE)
Partition 2: Manual root in CD4+ T cells
Partitions 1, 3: Automatic roots

Key observation: Multiple disconnected partitions are biologically meaningful in PBMC data:

Partition 2: Adaptive immune cells (T cells, NK cells) with activation/differentiation trajectories
Partition 1: Myeloid lineage (Classical Monocytes)
Partition 3: B lymphocyte lineage

These represent distinct hematopoietic lineages that don’t directly interconvert in peripheral blood.

Creating Custom Pseudotime Visualizations

The default plot_cells() is useful, but for publications you often need more control. Let’s create custom plots that properly handle multiple partitions:

#-----------------------------------------------
# STEP 12: Custom pseudotime visualizations
#-----------------------------------------------

# Extract data for custom plotting
plot_data <- data.frame(
  UMAP_1 = reducedDims(cds_main)[["UMAP"]][,1],
  UMAP_2 = reducedDims(cds_main)[["UMAP"]][,2],
  pseudotime = pseudotime(cds_main),
  cell_type = colData(cds_main)$final_annotation,
  condition = colData(cds_main)$condition,
  sample = colData(cds_main)$sample_id,
  partition = factor(partitions(cds_main))
)

# Custom pseudotime plot with ggplot2
p_custom_pseudotime <- ggplot(plot_data, aes(x = UMAP_1, y = UMAP_2, color = pseudotime)) +
  geom_point(size = 1.5, alpha = 0.8) +
  scale_color_viridis(option = "plasma", name = "Pseudotime") +
  labs(title = "Pseudotime Distribution Across All Partitions",
       x = "UMAP 1", y = "UMAP 2") +
  theme_classic(base_size = 12) +
  theme(plot.title = element_text(face = "bold", size = 14),
        legend.position = "right")

ggsave("plots/pseudotime/03_custom_pseudotime.png", 
       p_custom_pseudotime, width = 10, height = 8, dpi = 300)

# Pseudotime distribution by cell type (violin plot)
p_pseudotime_violin <- ggplot(plot_data, aes(x = reorder(cell_type, pseudotime, median), 
                                              y = pseudotime, fill = cell_type)) +
  geom_violin(trim = FALSE, scale = "width", alpha = 0.7) +
  geom_boxplot(width = 0.15, fill = "white", outlier.shape = NA) +
  coord_flip() +
  labs(title = "Pseudotime Distribution by Cell Type",
       subtitle = "Note: Pseudotime not comparable across partitions",
       x = "Cell Type", y = "Pseudotime") +
  theme_classic(base_size = 12) +
  theme(plot.title = element_text(face = "bold", size = 14),
        legend.position = "none")

ggsave("plots/pseudotime/04_pseudotime_by_celltype.png", 
       p_pseudotime_violin, width = 10, height = 6, dpi = 300)

# Pseudotime distribution by condition (density plot with facets for partitions)
n_partitions <- length(unique(plot_data$partition))

if (n_partitions > 1) {
  p_pseudotime_density <- ggplot(plot_data, aes(x = pseudotime, fill = condition)) +
    geom_density(alpha = 0.6) +
    facet_wrap(~ partition, ncol = 1, 
               labeller = labeller(partition = function(x) paste("Partition", x))) +
    scale_fill_manual(values = c("Healthy" = "#2E86AB", 
                                  "Post_Treatment" = "#F18F01")) +
    labs(title = "Pseudotime Distribution by Condition (Per Partition)",
         subtitle = "Each partition has independent pseudotime scale",
         x = "Pseudotime", y = "Density", fill = "Condition") +
    theme_classic(base_size = 12) +
    theme(plot.title = element_text(face = "bold", size = 14))
} else {
  p_pseudotime_density <- ggplot(plot_data, aes(x = pseudotime, fill = condition)) +
    geom_density(alpha = 0.6) +
    scale_fill_manual(values = c("Healthy" = "#2E86AB", 
                                  "Post_Treatment" = "#F18F01")) +
    labs(title = "Pseudotime Distribution by Condition",
         x = "Pseudotime", y = "Density", fill = "Condition") +
    theme_classic(base_size = 12) +
    theme(plot.title = element_text(face = "bold", size = 14))
}

ggsave("plots/pseudotime/05_pseudotime_by_condition.png", 
       p_pseudotime_density, width = 10, height = 8, dpi = 300)

# Additional plot: Partition-specific pseudotime (if multiple partitions)
if (n_partitions > 1) {
  # Create separate plots for each partition
  partition_plots <- list()

  for (i in 1:n_partitions) {
    partition_data <- plot_data[plot_data$partition == i, ]
    cell_types_in_partition <- unique(partition_data$cell_type)

    partition_plots[[i]] <- ggplot(partition_data, 
                                    aes(x = UMAP_1, y = UMAP_2, color = pseudotime)) +
      geom_point(size = 2, alpha = 0.8) +
      scale_color_viridis(option = "plasma", name = "Pseudotime") +
      labs(title = paste0("Partition ", i),
           subtitle = paste("Cell types:", paste(cell_types_in_partition, collapse = ", ")),
           x = "UMAP 1", y = "UMAP 2") +
      theme_classic(base_size = 10) +
      theme(plot.title = element_text(face = "bold", size = 12))
  }

  # Combine partition plots
  p_partition_combined <- wrap_plots(partition_plots, ncol = 2)

  ggsave("plots/pseudotime/06_pseudotime_per_partition.png",
         p_partition_combined, width = 14, height = 10, dpi = 300)
}

Interpreting the violin plot (pseudotime by cell type):

Wide violins: Cell type spans broad pseudotime range (cells at many developmental stages)
Narrow violins: Cell type confined to specific pseudotime window (discrete developmental state)
Important: Only compare cell types within the same partition
CD4+ T cells, NK cells, T cells in Partition 2 are comparable
Classical Monocytes (Partition 1) have independent pseudotime scale
B cells (Partition 3) have independent pseudotime scale

Interpreting the density plot (pseudotime by condition):

Partition-specific facets: Each partition shown separately with its own pseudotime scale
Overlapping peaks: Both conditions have cells at similar developmental stages
Shifted peaks: Treatment drives cells toward earlier or later pseudotime
Different shapes: Treatment changes distribution of cells along trajectory

Interpreting Trajectory Analysis Results

Understanding What Trajectories Represent

Critical conceptual point: A trajectory is a model, not ground truth.

What trajectories tell us:

✅ The most likely path cells take through gene expression space
✅ The order of gene expression changes during a process
✅ Branch points where cells diverge into different fates
✅ Candidate driver genes that change along the path

What trajectories do NOT tell us:

❌ Absolute time (pseudotime ≠ hours/days)
❌ Causal mechanisms (correlation ≠ causation)
❌ Whether all cells actually traverse the full path
❌ Directionality (without additional evidence)

Biological Validation: Does the Trajectory Make Sense?

Before interpreting results, validate that the trajectory is biologically plausible:

1. Does the trajectory structure match known biology?

#-----------------------------------------------
# STEP 13: Validation - trajectory structure per partition
#-----------------------------------------------

# Count cells in each cell type along pseudotime
# Group by partition for proper interpretation
pseudotime_summary <- plot_data %>%
  group_by(partition, cell_type) %>%
  summarise(
    n_cells = n(),
    median_pseudotime = median(pseudotime, na.rm = TRUE),
    min_pseudotime = min(pseudotime, na.rm = TRUE),
    max_pseudotime = max(pseudotime, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  arrange(partition, median_pseudotime)

cat("Pseudotime summary by cell type and partition:\n")
print(as.data.frame(pseudotime_summary))

Expected pattern for multiple partitions:

# Pseudotime summary by partition and cell type
  partition           cell_type n_cells median_pseudotime min_pseudotime max_pseudotime
1         1 Classical Monocytes     522         0.1538738   0.000000e+00      1.2040175
2         1           Monocytes      56         0.1921420   5.064492e-03      0.8355923
3         2        CD4+ T cells    3138         0.2238169   0.000000e+00      5.6224804
4         2           Monocytes     158         1.4736496   2.779210e-06     23.5694842
5         2             T cells     773         3.3443203   0.000000e+00     23.3278436
6         2        CD8+ T cells     313         6.0913848   5.558420e-06      6.7880647
7         2            NK cells    1607        10.6601735   1.620208e+00     23.6647194
8         3             B cells     674         0.2204692   0.000000e+00      1.0740133
9         3           Monocytes      21         0.4649489   4.405602e-06      0.8723964

Red flags (suggests trajectory may not be biological):

Terminally differentiated cells at pseudotime = 0 within their partition (should be high)
Progenitor-like cells at highest pseudotime within their partition (should be low)
No correlation between known activation markers and pseudotime
Trajectory driven by technical factors (batch, library size, %mitochondrial genes)

2. Do known marker genes align with trajectory?

#-----------------------------------------------
# STEP 14: Validation - marker gene expression
#-----------------------------------------------

# Test known markers along pseudotime
# Naive T cell markers should decrease: CCR7, LEF1, TCF7
# Activation markers should increase: CD69, HLA-DRA, IL2RA

marker_genes <- c("CCR7", "LEF1", "CD69", "HLA-DRA", "IL2RA", "GZMB")

# Check if markers are in dataset
marker_genes <- marker_genes[marker_genes %in% rownames(cds_main)]

if (length(marker_genes) > 0) {
  # Plot each marker along pseudotime
  for (gene in marker_genes) {
    p <- plot_cells(
      cds_main,
      genes = gene,
      label_cell_groups = FALSE,
      show_trajectory_graph = TRUE,
      cell_size = 1
    ) +
      scale_color_viridis(option = "magma", name = "Expression") +
      ggtitle(paste0("Expression of ", gene, " Along Trajectory"))

    ggsave(paste0("plots/genes_over_pseudotime/marker_", gene, ".png"),
           p, width = 8, height = 6, dpi = 300)
  }
  cat("Saved marker gene trajectory plots\n")
} else {
  cat("Marker genes not found in dataset\n")
}

Expected patterns:

CCR7, LEF1: High at low pseudotime (naive state), decrease along trajectory
CD69, HLA-DRA: Low at low pseudotime, increase (activation)
GZMB: Low early, high late (effector function)

If markers don’t follow expected patterns: Trajectory may be capturing something other than the biological process you expected (e.g., cell cycle, batch effects, or a different biological process).

3. Does treatment affect pseudotime distribution?

#-----------------------------------------------
# STEP 15: Validation - condition effects
#-----------------------------------------------

# Statistical test: Do conditions differ in pseudotime distribution?
wilcox_test <- wilcox.test(pseudotime ~ condition, data = plot_data)

cat("\nWilcoxon test (Healthy vs Post-Treatment pseudotime):\n")
cat("  W statistic:", wilcox_test$statistic, "\n")
cat("  P-value:", format(wilcox_test$p.value, scientific = TRUE), "\n")

# Calculate median pseudotime per condition
median_summary <- plot_data %>%
  group_by(condition) %>%
  summarise(
    median_pseudotime = median(pseudotime, na.rm = TRUE),
    mean_pseudotime = mean(pseudotime, na.rm = TRUE)
  )

cat("\nMedian pseudotime by condition:\n")
print(median_summary)

Example Output:

Wilcoxon test (Healthy vs Post-Treatment pseudotime):

  W statistic: 6726000
  P-value: 1.297578e-01

Median pseudotime by condition:

  condition      median_pseudotime mean_pseudotime
1 Healthy                    0.678            3.78
2 Post_Treatment             0.652            3.35

Interpretation:

P < 0.05: Treatment significantly shifts cells along trajectory
Higher pseudotime in Post-Treatment: Treatment drives differentiation/activation
Lower pseudotime in Post-Treatment: Treatment returns cells to earlier state
No difference: Treatment effect independent of this trajectory

Understanding Branch Points and Cell Fate Decisions

If your trajectory has branches, these represent critical decision points:

#-----------------------------------------------
# STEP 16: Analyze branch points (if present)
#-----------------------------------------------

# Check for branches in the graph
graph_test_results <- graph_test(
  cds_main,
  neighbor_graph = "principal_graph",
  cores = 1
)

# Identify genes with significant spatial autocorrelation (branch-associated)
sig_genes <- subset(graph_test_results, q_value < 0.05)

cat("\nGraph test results:\n")
cat("  Total genes tested:", nrow(graph_test_results), "\n")
cat("  Significant genes (q < 0.05):", nrow(sig_genes), "\n")

if (nrow(sig_genes) > 0) {
  cat("\nTop 10 branch-associated genes (by Moran's I):\n")
  top_genes <- head(sig_genes[order(sig_genes$morans_I, decreasing = TRUE), ], 10)
  print(top_genes[, c("gene_short_name", "morans_I", "q_value")])
}

Example Output:

Graph test results:

  Total genes tested: 18861
  Significant genes (q &lt; 0.05): 8992

Top 10 branch-associated genes (by Moran's I):

        gene_short_name  morans_I q_value
CD74               CD74 0.8354366       0
LYZ                 LYZ 0.7947756       0
NKG7               NKG7 0.7895408       0
BANK1             BANK1 0.7815003       0
HLA-DRA         HLA-DRA 0.7807247       0
AFF3               AFF3 0.7702808       0
CD79A             CD79A 0.7613293       0
LST1               LST1 0.7536135       0
FTL                 FTL 0.7531880       0
IGHM               IGHM 0.7401368       0

Understanding graph_test output:

Moran’s I: Spatial autocorrelation statistic
High values (close to 1): Gene expression changes smoothly along trajectory
Low values (close to 0): Gene expression independent of trajectory position
q_value: FDR-adjusted p-value
q < 0.05: Gene significantly associated with trajectory structure

Biological interpretation:

Genes with high Moran’s I are trajectory markers: They define the developmental path
Genes specific to one branch are fate determinants: They specify cell fate choices
Can be used to identify driver transcription factors: Regulators of differentiation

Integrating Results Back to Seurat Object

For downstream analysis and cross-referencing with other Parts, add pseudotime to your Seurat object:

#-----------------------------------------------
# STEP 17: Add pseudotime to Seurat metadata
#-----------------------------------------------

# Create pseudotime vector with cell names (includes all partitions)
pseudotime_vector <- pseudotime(cds_main)
names(pseudotime_vector) <- colnames(cds_main)

# Also add partition information
partition_vector <- partitions(cds_main)
names(partition_vector) <- colnames(cds_main)

# Add to Seurat object
seurat_subset$monocle3_pseudotime <- NA  # Initialize
seurat_subset$monocle3_partition <- NA   # Initialize

seurat_subset$monocle3_pseudotime[colnames(cds_main)] <- pseudotime_vector
seurat_subset$monocle3_partition[colnames(cds_main)] <- as.character(partition_vector)

# Visualize pseudotime in Seurat
p_seurat_pseudotime <- FeaturePlot(
  seurat_subset,
  features = "monocle3_pseudotime",
  reduction = umap_reduction,
  pt.size = 1
) +
  scale_color_viridis(option = "plasma", name = "Pseudotime") +
  ggtitle("Monocle 3 Pseudotime (All Partitions in Seurat)")

ggsave("plots/pseudotime/07_pseudotime_in_seurat.png",
       p_seurat_pseudotime, width = 10, height = 8, dpi = 300)

# Also visualize partition assignment
p_seurat_partition <- DimPlot(
  seurat_subset,
  group.by = "monocle3_partition",
  reduction = umap_reduction,
  pt.size = 1
) +
  ggtitle("Monocle 3 Partition Assignment (in Seurat)")

ggsave("plots/pseudotime/08_partition_in_seurat.png",
       p_seurat_partition, width = 10, height = 8, dpi = 300)

# Save updated Seurat object
saveRDS(seurat_subset, "results/seurat_with_pseudotime.rds")

Why integrate back to Seurat?

Use Seurat’s powerful visualization tools with pseudotime
Cross-reference with differential expression results (Part 5)
Partition information enables partition-specific analyses
Subset cells by pseudotime windows for focused analysis
Correlate pseudotime with other metadata (sample, condition, QC metrics)

Important note: The Seurat object now contains:

monocle3_pseudotime: Pseudotime values (remember: NOT comparable across partitions!)
monocle3_partition: Partition assignment (1, 2, 3, etc.)

Identifying Genes That Change Along Trajectories

Finding Genes Correlated with Pseudotime

The most basic trajectory analysis: Which genes increase or decrease along pseudotime?

#-----------------------------------------------
# STEP 18: Identify pseudotime-dependent genes
#-----------------------------------------------

# Fit generalized additive models (GAMs) to identify genes changing with pseudotime
trajectory_genes <- graph_test(
  cds_main,
  neighbor_graph = "principal_graph",
  cores = 1,
  verbose = FALSE
)

# Filter for significant genes
sig_trajectory_genes <- subset(trajectory_genes, q_value < 0.05)

cat("Genes significantly associated with trajectory:\n")
cat("  Total tested:", nrow(trajectory_genes), "\n")
cat("  Significant (q < 0.05):", nrow(sig_trajectory_genes), "\n")
cat("  Top genes by Moran's I:\n")

# Show top 20 genes
top20_genes <- head(sig_trajectory_genes[order(sig_trajectory_genes$morans_I, decreasing = TRUE), ], 20)
print(top20_genes[, c("gene_short_name", "morans_I", "q_value")])

# Save results
write.csv(sig_trajectory_genes, 
          "results/trajectory_associated_genes.csv",
          row.names = FALSE)

Expected output:

Genes significantly associated with trajectory:
  Total tested: 18861 
  Significant (q &lt; 0.05): 8992 

  Top genes by Moran's I:
        gene_short_name  morans_I q_value
CD74               CD74 0.8354366       0
LYZ                 LYZ 0.7947756       0
NKG7               NKG7 0.7895408       0
BANK1             BANK1 0.7815003       0
...

Interpreting Moran’s I:

0.4-0.5: Very strong spatial autocorrelation (classic trajectory markers)
0.3-0.4: Strong association (important trajectory genes)
0.2-0.3: Moderate association (may be trajectory-relevant)
< 0.2: Weak association (likely not trajectory-specific)

Visualizing Top Trajectory Genes

#-----------------------------------------------
# STEP 19: Visualize top trajectory genes
#-----------------------------------------------

# Select top 12 genes for visualization
top_genes_to_plot <- head(top20_genes$gene_short_name, 12)

# Create multi-panel plot
gene_plots <- list()
for (i in 1:length(top_genes_to_plot)) {
  gene <- top_genes_to_plot[i]

  gene_plots[[i]] <- plot_cells(
    cds_main,
    genes = gene,
    label_cell_groups = FALSE,
    show_trajectory_graph = TRUE,
    cell_size = 0.75
  ) +
    scale_color_viridis(option = "magma", name = "Expression") +
    ggtitle(gene) +
    theme(plot.title = element_text(size = 10, face = "bold"),
          legend.position = "right",
          legend.key.size = unit(0.3, "cm"))
}

# Combine into grid
p_top_genes <- wrap_plots(gene_plots, ncol = 3)

ggsave("plots/genes_over_pseudotime/07_top_trajectory_genes.png",
       p_top_genes, width = 14, height = 12, dpi = 300)

What to look for:

Gradual color transitions: Gene expression changes smoothly along trajectory
Spatial patterns: Expression highest at specific pseudotime windows
Biological coherence: Do these genes make sense for the process?

Best Practices and Common Pitfalls in Trajectory Analysis

Best Practices for Robust Trajectory Inference

1. Start with high-quality, integrated data

✅ Do:

Use data that passed stringent QC (Part 2)
Ensure batch effects are corrected (Part 3)
Verify cell type annotations are accurate (Part 4)

❌ Don’t:

Run trajectory analysis on unfiltered data (doublets, low-quality cells)
Use data with strong batch effects (trajectories will follow batch gradients)
Proceed if cell types don’t make biological sense

2. Ensure your dataset contains the full trajectory

✅ Do:

Include cells across all developmental stages (early → late)
Verify intermediate states exist (not just endpoints)
Check that trajectory connects expected start and end points

❌ Don’t:

Attempt trajectory analysis with only terminally differentiated cells
Assume a trajectory exists without biological evidence
Force unrelated cell types onto a single trajectory

3. Validate biological plausibility

✅ Do:

Check that known markers align with pseudotime
Verify trajectory direction makes biological sense
Compare with published developmental hierarchies
Test multiple parameter settings to assess robustness

❌ Don’t:

Accept default results without validation
Ignore trajectories that contradict known biology
Over-interpret technical artifacts as biological signals

4. Use appropriate sample sizes

✅ Do:

Ensure ≥100 cells per developmental stage (more is better)
Include biological replicates to validate trajectory consistency
Use sufficient cells to capture rare intermediate states

❌ Don’t:

Run trajectory analysis on <500 total cells (underpowered)
Rely on single samples (batch effects may confound)
Expect to resolve rare transitions without sufficient cells

5. Integrate with orthogonal evidence

✅ Do:

Correlate pseudotime with known temporal markers
Validate with RNA velocity (future transcriptional state)
Compare with lineage tracing if available
Test predictions experimentally (perturbations, time-course)

❌ Don’t:

Claim causality based on trajectory alone
Ignore contradictions with other data modalities
Present trajectory as proof without supporting evidence

6. Handle multiple partitions appropriately

✅ Do:

Analyze all biologically meaningful partitions separately
Use partition-specific root selection (manual or automatic as appropriate)
Report pseudotime separately per partition (values NOT comparable across partitions)
Validate that partitions represent real biology (not batch effects)
Visualize each partition’s trajectory independently

❌ Don’t:

Ignore small partitions without investigating what they contain
Compare pseudotime values across partitions (fundamentally incomparable)
Force unrelated partitions onto single trajectory (use partition_qval carefully)
Assume partitions are artifacts (may be genuine biological separation)

Common Pitfalls and How to Avoid Them

Pitfall 1: Trajectory driven by batch effects, not biology

Diagnosis:

# Check if pseudotime correlates with batch
cor.test(plot_data$pseudotime, as.numeric(factor(plot_data$sample)))

Symptoms:

Cells from same batch cluster at same pseudotime
Pseudotime gradient follows sample/batch structure
Batch-specific genes have highest Moran’s I

Solutions:

Re-run integration with stronger batch correction (Part 3)
Regress out batch effects before trajectory inference
Use batch-balanced downsampling

Pitfall 2: Cell cycle confounded with differentiation

Diagnosis:

# Check cell cycle phase distribution along pseudotime
table(colData(cds_main)$Phase, cut(pseudotime(cds_main), breaks = 5))

Symptoms:

Cell cycle genes (MKI67, TOP2A) dominate trajectory genes
G2/M cells enriched at specific pseudotime
Trajectory reflects proliferation, not differentiation

Solutions:

Regress out cell cycle scores (Seurat: CellCycleScoring + ScaleData)
Subset to non-cycling cells (G1 phase only)
Use cell cycle-agnostic dimension reduction

Pitfall 3: Choosing wrong root cells

Diagnosis:

# Check if "mature" markers are high at pseudotime=0
# Should be LOW at root

Symptoms:

Differentiated cells have pseudotime = 0
Progenitor markers high at high pseudotime (backward)
Biological interpretation doesn’t make sense

Solutions:

Re-run order_cells() with correct root specification
Use known progenitor cells as root (naive, stem, resting)
Validate root choice with marker gene expression

Pitfall 4: Over-interpreting noise as biological variation

Diagnosis:

# Check trajectory genes for biological coherence
# Are top genes known to be involved in this process?

Symptoms:

Top trajectory genes are ribosomal proteins (technical)
Mitochondrial genes highly ranked (stress/quality)
No biological pathway enrichment in trajectory genes

Solutions:

Remove technical confounders (ribosomal, mitochondrial genes)
Increase ncenter parameter (smoother trajectory)
Filter genes by minimum expression and prevalence

Pitfall 5: Forcing disconnected processes onto single trajectory

Diagnosis:

# Check number of partitions
length(unique(partitions(cds)))

Symptoms:

Multiple disconnected partitions (e.g., 3 partitions in PBMC data)
Unrelated cell types connected by spurious edges
Trajectory spans biologically impossible transitions (e.g., T cells → B cells)

Solutions:

Analyze each partition separately (as shown in Steps 9-10)
Validate that partitions represent real biology (check cell type composition)
Use partition-specific root selection (manual for known hierarchies, automatic for unclear ones)
Do NOT force partitions together by setting partition_qval = 1.0 without biological justification

Example from this tutorial:

Partition 1: T cells, NK cells, Monocytes (connected immune trajectory) ✅
Partition 2: Classical Monocytes (distinct myeloid lineage) ✅
Partition 3: B cells (separate lymphoid lineage) ✅
These SHOULD be separate—forcing connection would create artificial biology!

Pitfall 6: Misinterpreting pseudotime across partitions

Diagnosis:

# Check if you're comparing pseudotime across partitions
plot_data %>% group_by(partition) %>% summarise(max_pt = max(pseudotime))

Symptoms:

Comparing median pseudotime of B cells (Partition 3) vs T cells (Partition 1)
Claiming “B cells are earlier in development than T cells” based on lower pseudotime values
Merging partition-specific analyses without accounting for independent scales

Solutions:

Never compare pseudotime values across partitions (they’re on different scales!)
Analyze each partition independently
Report findings partition-specifically: “Within the T cell trajectory (Partition 1), NK cells are at later pseudotime…”
If biological connections exist between partitions, validate with orthogonal data

Pitfall 7: Insufficient cells for complex trajectories

Symptoms:

Trajectory graph is overly simplistic (straight line) despite expecting branches
Many cells have NA pseudotime
High variance in pseudotime between biological replicates

Solutions:

Increase cell numbers (use full dataset, not downsampled)
Merge technical replicates to increase power
Focus on subset of trajectory with sufficient cells

Troubleshooting: Trajectory Doesn’t Match Expectations

“I expected a branched trajectory but got a linear one”

Possible causes:

Insufficient cells at branch point
Branch point exists but wasn’t captured in sampling
Biology is actually linear (assumption was wrong)
Parameter ncenter too low (increase to 1000-2000)

“Trajectory connects cells that shouldn’t be related”

Possible causes:

Batch effects creating artificial continuity
Need to subset to biologically related cell types only
UMAP/PCA doesn’t separate populations well (redo with more PCs)

“Pseudotime doesn’t correlate with known markers”

Possible causes:

Wrong root cells chosen (reverse the trajectory)
Process isn’t actually occurring (static cell types)
Markers are cell type-specific, not trajectory-specific

“All genes have low Moran’s I”

Possible causes:

No biological trajectory in data
Trajectory is technical artifact (batch, quality)
Need more cells to resolve smooth transitions

Conclusion: From Clusters to Continuous Cell States

Congratulations! You’ve completed a comprehensive trajectory analysis using Monocle 3.

Key Conceptual Takeaways

1. Trajectories reveal continuous processes that clustering obscures

Cell differentiation isn’t a series of discrete jumps
Activation, stress responses, and development are gradual
Pseudotime provides temporal ordering without real time-course data

2. Trajectory inference is model-based, requiring validation

Always verify trajectories match biological expectations
Check that marker genes align with pseudotime
Integrate multiple lines of evidence (RNA velocity, lineage tracing)
Be conservative in claims—correlation ≠ causation

3. Technical quality determines biological interpretability

Poor integration → batch-driven trajectories
Insufficient cells → missed branch points
Low-quality data → noise masquerading as biology
Parts 1-3 (QC, integration) are prerequisites for meaningful trajectories

4. Multiple partitions are often biologically meaningful

Don’t force connection: Disconnected partitions may represent genuine biological separation
In PBMC data: T/NK cells (Partition 2), Classical Monocytes (Partition 1), B cells (Partition 3)
Each partition = independent trajectory: Separate developmental processes with independent pseudotime
Biological interpretation: Hematopoietic lineages that don’t interconvert in peripheral blood
Analysis strategy: Treat each partition separately, use appropriate root selection per partition

5. Root selection impacts biological interpretation

Manual roots: Biologically meaningful when developmental hierarchy is known (e.g., naive T cells)
Automatic roots: Useful when hierarchy is unclear (e.g., isolated monocyte cluster)
This tutorial demonstrated both: Mixed approach based on biological knowledge availability
Validation essential: Check that inferred direction matches known marker expression patterns

Reporting Trajectory Analysis in Publications

Methods section should include:

Software versions: Monocle 3 (version), Seurat (version), R (version)
Parameter settings: Number of PCA dimensions, ncenter, nn.k
Root specification: How root cells were chosen (automated vs manual)
Validation approach: Marker gene concordance, biological plausibility checks
Cell numbers: Total cells, cells per developmental stage
Partition handling: How multiple partitions were analyzed

Results should report:

Trajectory structure: Linear, branched, number of branches/leaves
Pseudotime range: Min/max values, biological meaning
Cell distribution: How cell types/conditions distribute along trajectory
Key findings: Genes/pathways associated with trajectory
Validation: Correspondence with known biology, orthogonal evidence

Figures should show:

Trajectory structure: Graph overlaid on UMAP with pseudotime coloring
Cell type distribution: How annotated types map to trajectory
Marker genes: Expression along pseudotime (heatmap + individual genes)
Branch point analysis: If applicable, genes distinguishing fates
Condition effects: How treatment/perturbation affects pseudotime

References

Cao J, Spielmann M, Qiu X, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496-502. doi:10.1038/s41586-019-0969-x [Monocle 3 method]
Qiu X, Mao Q, Tang Y, et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017;14(10):979-982. doi:10.1038/nmeth.4402 [Monocle 2 principal graph]
Trapnell C, Cacchiarelli D, Grimsby J, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32(4):381-386. doi:10.1038/nbt.2859 [Original Monocle pseudotime]
Street K, Risso D, Fletcher RB, et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19(1):477. doi:10.1186/s12864-018-4772-0 [Slingshot alternative]
Wolf FA, Hamey FK, Plass M, et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 2019;20(1):59. doi:10.1186/s13059-019-1663-x [PAGA method]
La Manno G, Soldatov R, Zeisel A, et al. RNA velocity of single cells. Nature. 2018;560(7719):494-498. doi:10.1038/s41586-018-0414-6 [RNA velocity]
Bergen V, Lange M, Peidli S, et al. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol. 2020;38(12):1408-1414. doi:10.1038/s41587-020-0591-3 [scVelo]
Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nat Biotechnol. 2019;37(5):547-554. doi:10.1038/s41587-019-0071-9 [Comprehensive method comparison]
Tritschler S, Büttner M, Fischer DS, et al. Concepts and limitations for learning developmental trajectories from single cell genomics. Development. 2019;146(12):dev170506. doi:10.1242/dev.170506 [Trajectory inference concepts]
Haghverdi L, Büttner M, Wolf FA, Buettner F, Theis FJ. Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods. 2016;13(10):845-848. doi:10.1038/nmeth.3971 [Diffusion pseudotime]
Setty M, Kiseliovas V, Levine J, et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat Biotechnol. 2019;37(4):451-460. doi:10.1038/s41587-019-0068-4 [Palantir method]
Cannoodt R, Saelens W, Deconinck L, Saeys Y. Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells. Nat Commun. 2021;12(1):3942. doi:10.1038/s41467-021-24152-2 [Trajectory simulation/benchmarking]
Weinreb C, Wolock S, Tusi BK, Socolovsky M, Klein AM. Fundamental limits on dynamic inference from single-cell snapshots. Proc Natl Acad Sci USA. 2018;115(10):E2467-E2476. doi:10.1073/pnas.1714723115 [Theoretical limitations]
Lee H, Joo JY, Sohn DH, et al. Single-cell RNA sequencing reveals rebalancing of immunological response in patients with periodontitis after non-surgical periodontal therapy. J Transl Med. 2022;20(1):504. doi:10.1186/s12967-022-03686-8 [Dataset source – GSE174609]
Hao Y, Stuart T, Kowalski MH, et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol. 2024;42(2):293-304. doi:10.1038/s41587-023-01767-y [Seurat 5]

This tutorial is part of the NGS101.com series on single cell sequencing analysis. If this tutorial helped advance your research, please comment and share your experience to help other researchers! Subscribe to stay updated with our latest bioinformatics tutorials and resources.

How to Analyze Single-Cell RNA-seq Data – Complete Beginner’s Guide Part 7: Trajectory and Pseudotime Analysis Using Monocle 3

Introduction: Understanding Cell State Transitions

What Is Trajectory Analysis and Why Do We Need It?

What Biological Questions Does Trajectory Analysis Answer?

When Should You Perform Trajectory Analysis?

How Does Trajectory Analysis Work Conceptually?

What Is Monocle 3 and How Does It Infer Trajectories?

What You’ll Learn in This Tutorial

Setting Up Your R Environment for Trajectory Analysis

Installing Monocle 3 and Dependencies

Loading Required Libraries

Preparing Data for Monocle 3 Analysis

Understanding the Monocle 3 Workflow

Loading the Annotated Seurat Object

Strategic Downsampling: Maintaining Biological Diversity

Visualizing the Downsampled Dataset

Trajectory Inference with Monocle 3: Step-by-Step Workflow

Converting Seurat Object to Monocle 3 CDS Format

Preprocessing: Normalization and Dimensionality Reduction

Aligning to Original Seurat UMAP Coordinates

Learning the Trajectory Graph Structure

Handling Multiple Partitions

Understanding Pseudotime and Root Selection

Step 10a: Order Partition 2 (Manual Root — CD4+ T Cells)

Step 10b: Order Partition 1 (Automatic Root — Classical Monocytes)

Step 10c: Order Partition 3 (Automatic Root — B Cells)

Verify All Partitions Are Ordered

Visualizing Trajectories Across All Partitions

Creating Custom Pseudotime Visualizations

Interpreting Trajectory Analysis Results

Understanding What Trajectories Represent

Biological Validation: Does the Trajectory Make Sense?

Understanding Branch Points and Cell Fate Decisions

Integrating Results Back to Seurat Object

Identifying Genes That Change Along Trajectories

Finding Genes Correlated with Pseudotime

Visualizing Top Trajectory Genes

Best Practices and Common Pitfalls in Trajectory Analysis

Best Practices for Robust Trajectory Inference

Common Pitfalls and How to Avoid Them

Troubleshooting: Trajectory Doesn’t Match Expectations

Conclusion: From Clusters to Continuous Cell States

Key Conceptual Takeaways

Reporting Trajectory Analysis in Publications

References

Share this:

Like this:

Comments

Leave a Reply Cancel reply

Search

Categories

Recent Posts

Tags