A discrete-diffusion generative model that operates directly on single-cell gene counts, enabling unconditional and perturbation-conditioned scRNA-seq generation.
Single-cell RNA sequencing (scRNA-seq) measures gene expression as discrete counts — the number of transcripts captured per gene per cell. Most generative models of this data, however, first transform those counts into a continuous space (via normalization, log-transforms, or latent variational autoencoders such as scVI) and model them there. That choice is convenient for continuous methods like diffusion or VAEs, but it sits awkwardly with the inherently discrete, count-based nature of the measurement.
DCM (Discrete Cell Models), introduced by Bhattacharya, Gensbigler, Karim, and Lees at the University of Bristol in a February 2026 bioRxiv preprint, takes a different route: it applies discrete diffusion directly in the count domain, generating gene-expression profiles without leaving discrete space. This lets the model produce both unconditional samples and conditional ones — for instance, cell-type-specific transcriptional responses to genetic perturbations — while respecting the discrete structure of the data.
The authors report that DCM outperforms established baselines on single-cell generation and perturbation-response benchmarks, including scVI, CPA (Compositional Perturbation Autoencoder), and scGPT, with notably strong results on the Replogle perturbation dataset.
DCM is a discrete-diffusion generative model for single-cell gene expression that operates in the native count space, avoiding the continuous relaxations used by many prior methods. It supports both unconditional generation and conditional generation given covariates such as cell type and perturbation identity, enabling simulation of cell-type-specific responses to genetic perturbations. On benchmarks, the authors report that DCM exceeds the performance of scVI, CPA, and scGPT, with particularly strong results on the Replogle Perturb-seq dataset. As a February 2026 bioRxiv preprint (v2), full architectural details — parameter count, the discrete-diffusion noise schedule, and the exact training corpus — await the complete release, and code and trained weights have not yet been published.
DCM targets single-cell and functional-genomics researchers who want to model and simulate transcriptional states. Its conditional generation can produce synthetic cells for specified cell types and perturbations, supporting in-silico perturbation screens, data augmentation for rare cell states, and benchmarking of downstream analysis methods. By predicting perturbation responses, it can help prioritize genetic targets and interpret Perturb-seq experiments without exhaustively assaying every condition.
DCM contributes to a growing line of work arguing that generative models should respect the discrete, count-based structure of single-cell data rather than forcing it into continuous representations. By reporting gains over strong baselines (scVI, CPA, scGPT) and state-of-the-art results on Replogle, it positions discrete diffusion as a competitive approach for single-cell generation and perturbation prediction. As a recent preprint without released code or weights, its results await peer review and independent reproduction, but the discrete-domain framing is a noteworthy direction for single-cell generative modeling.