Overview

scLDM (single-cell Latent Diffusion Model) is a generative model for single-cell RNA sequencing data developed by Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D. Pearce, Donghui Li, Aly A. Khan, Theofanis Karaletsos, Jakub M. Tomczak, and collaborators at CZ Biohub and the Chan Zuckerberg Initiative. The model was released as a preprint in November 2025 (arXiv:2511.02986) and is hosted on CZI's Virtual Cells Platform. scLDM addresses one of the persistent challenges in single-cell computational biology: generating synthetic single-cell gene expression profiles that are both statistically realistic and biologically interpretable.

The fundamental obstacle to high-fidelity single-cell generation is the nature of the data itself. Single-cell RNA-seq profiles are high-dimensional count vectors — typically tens of thousands of gene measurements per cell — with strong sparsity, overdispersion, and complex gene-gene dependencies. Prior generative models, including variational autoencoders (scVI, scVAE) and conditional diffusion models (scDiffusion), have made significant progress but often impose artificial gene orderings, rely on shallow architectures, or fail to capture the exchangeability structure of gene expression: unlike pixels in an image, genes have no intrinsic spatial order, and a cell's identity is determined by which genes are expressed, not by the position those genes happen to occupy in a matrix row.

scLDM resolves this by combining two purpose-built components: a permutation-invariant variational autoencoder that compresses gene expression profiles into compact latent representations while respecting the orderless nature of gene data, and a latent diffusion model based on Diffusion Transformers and flow matching that generates diverse, biologically coherent latent codes conditioned on metadata such as tissue type, cell type, and experimental perturbation. The result is a model that can simulate both observational single-cell transcriptomics and counterfactual perturbation responses with substantially improved fidelity over previous approaches.

Key Features

Permutation-invariant architecture: The VAE uses a unified Multi-head Cross-Attention Block (MCAB) that functions as both a permutation-invariant pooling operation during encoding and a permutation-equivariant unpooling operation during decoding. This design means the model produces identical outputs regardless of the order in which genes are presented, correctly reflecting the biology of gene expression.
Flow-matching diffusion transformer: The latent diffusion component replaces the standard Gaussian prior of a VAE with a Diffusion Transformer trained using linear interpolants (flow matching). This generative process navigates the latent space more efficiently than standard denoising diffusion probabilistic models, producing higher-quality samples with fewer function evaluations.
Multi-conditional classifier-free guidance: scLDM supports conditioning on multiple biological covariates simultaneously — including tissue of origin, cell type, disease status, and perturbation identity — using classifier-free guidance. This enables nuanced conditional generation for complex experimental designs involving multiple interacting variables.
Observational and perturbational generation: The model is trained to generate both unperturbed cell states and cells responding to specific genetic or chemical perturbations. This dual capability allows researchers to simulate entire experimental datasets in silico, including treatment arms and controls.
Superior reconstruction fidelity: Compared to previous single-cell generative models, scLDM achieves up to four-fold improvement in correlation with ground truth gene expression during cell reconstruction, with particularly large gains on complex datasets such as the Human Lung Cell Atlas.

Technical Details

scLDM is a two-stage generative model. In the first stage, a variational autoencoder compresses high-dimensional single-cell count matrices into fixed-size continuous latent representations. The encoder applies the MCAB, a multi-head cross-attention mechanism that operates over gene-expression pairs without assuming any positional structure, producing a permutation-invariant latent code. The decoder applies the inverse operation (permutation-equivariant unpooling) to reconstruct per-gene expression values. The VAE loss combines a count-based reconstruction term appropriate for sparse overdispersed RNA-seq data with a KL divergence regularizer.

In the second stage, a Diffusion Transformer (DiT) learns to generate the latent codes produced by the VAE encoder, using flow matching with linear interpolants as the training objective. Flow matching frames generation as learning a velocity field that maps a simple noise distribution to the data distribution along straight-line paths, providing a more efficient training signal than noise-prediction-based diffusion objectives. Conditional generation is achieved through classifier-free guidance, where the DiT is trained both with and without covariate conditioning and the conditioning signal is amplified at inference time.

Benchmark evaluations span both reconstruction and generation tasks. For cell reconstruction, scLDM achieves Pearson correlations with ground truth that are up to four times higher than baseline models, with the Human Lung Cell Atlas showing particularly strong improvements due to its high cell-type diversity. For unconditional and conditional generation, scLDM surpasses all publicly available single-cell generative models across multiple metrics, producing profiles with more realistic marginal distributions, more faithful gene-gene covariance structure, and better performance when used as training data for downstream cell-type classifiers. Validation datasets include Parse 1M (over 1.2 million immune cells exposed to 90 cytokines) and the Replogle CRISPR screen dataset (genome-scale knockouts across multiple cell lines).

Applications

scLDM serves researchers in single-cell genomics who need to generate synthetic training data, augment small experimental datasets, simulate counterfactual perturbation responses, or explore the gene expression space of a biological system in silico. In drug discovery and functional genomics, perturbation-conditioned generation allows researchers to predict the transcriptomic consequences of genetic knockouts or chemical treatments before running expensive experiments, supporting computational prioritization of candidate targets. In machine learning contexts, scLDM-generated synthetic cells can augment real datasets to improve the robustness of downstream classifiers — particularly in disease settings where labeled samples from rare cell populations are limited. The model's demonstrated performance on COVID-infected and liver cancer cell classification tasks establishes its practical utility beyond simple data augmentation, approaching the performance of specialized supervised models.

Impact

scLDM establishes a new state of the art for generative modeling of single-cell transcriptomics, achieving substantial improvements in reconstruction fidelity and generation quality over prior VAE-based and diffusion-based approaches. Its permutation-invariant design principle is a principled architectural contribution that other single-cell deep learning models could adopt to better respect the biological structure of gene expression data. The integration of flow matching into the latent diffusion framework provides an efficient training objective that is more computationally tractable than standard denoising diffusion for this data modality. As the base model underlying the specialized scLDM.CD4 fine-tune (which targets counterfactual perturbation in CD4+ T cells), scLDM represents a foundational component of CZI's generative virtual cell infrastructure. Ongoing limitations include the challenge of generalizing perturbation predictions to held-out genetic interventions not seen during training and the computational requirements of training the two-stage architecture on large cell atlases.

Overview

Key Features

Permutation-invariant architecture: The VAE uses a unified Multi-head Cross-Attention Block (MCAB) that functions as both a permutation-invariant pooling operation during encoding and a permutation-equivariant unpooling operation during decoding. This design means the model produces identical outputs regardless of the order in which genes are presented, correctly reflecting the biology of gene expression.

Flow-matching diffusion transformer: The latent diffusion component replaces the standard Gaussian prior of a VAE with a Diffusion Transformer trained using linear interpolants (flow matching). This generative process navigates the latent space more efficiently than standard denoising diffusion probabilistic models, producing higher-quality samples with fewer function evaluations.

Multi-conditional classifier-free guidance: scLDM supports conditioning on multiple biological covariates simultaneously — including tissue of origin, cell type, disease status, and perturbation identity — using classifier-free guidance. This enables nuanced conditional generation for complex experimental designs involving multiple interacting variables.

Observational and perturbational generation: The model is trained to generate both unperturbed cell states and cells responding to specific genetic or chemical perturbations. This dual capability allows researchers to simulate entire experimental datasets in silico, including treatment arms and controls.

Superior reconstruction fidelity: Compared to previous single-cell generative models, scLDM achieves up to four-fold improvement in correlation with ground truth gene expression during cell reconstruction, with particularly large gains on complex datasets such as the Human Lung Cell Atlas.

Technical Details

Applications

Impact

scLDM

Overview

Key Features

Technical Details

Applications

Impact

Tags

Resources

scLDM

Overview

Key Features

Technical Details

Applications

Impact

Tags

Resources