bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

DCM (Discrete Cell Models)

University of Bristol

A discrete-diffusion generative model that operates directly on single-cell gene counts, enabling unconditional and perturbation-conditioned scRNA-seq generation.

Released: February 2026

Single-cell RNA sequencing (scRNA-seq) measures gene expression as discrete counts — the number of transcripts captured per gene per cell. Most generative models of this data, however, first transform those counts into a continuous space (via normalization, log-transforms, or latent variational autoencoders such as scVI) and model them there. That choice is convenient for continuous methods like diffusion or VAEs, but it sits awkwardly with the inherently discrete, count-based nature of the measurement.

DCM (Discrete Cell Models), introduced by Bhattacharya, Gensbigler, Karim, and Lees at the University of Bristol in a February 2026 bioRxiv preprint, takes a different route: it applies discrete diffusion directly in the count domain, generating gene-expression profiles without leaving discrete space. This lets the model produce both unconditional samples and conditional ones — for instance, cell-type-specific transcriptional responses to genetic perturbations — while respecting the discrete structure of the data.

The authors report that DCM outperforms established baselines on single-cell generation and perturbation-response benchmarks, including scVI, CPA (Compositional Perturbation Autoencoder), and scGPT, with notably strong results on the Replogle perturbation dataset.

#Key Features

  • Discrete diffusion on raw counts: Models gene expression directly in the discrete count domain rather than transforming to continuous space, matching the nature of scRNA-seq data.
  • Conditional generation: Supports conditioning on covariates such as cell type and genetic perturbation to generate context-specific transcriptional responses.
  • Perturbation-response modeling: Predicts how cells respond transcriptionally to genetic perturbations, a key task for functional genomics.
  • Strong benchmark performance: Reported to outperform scVI, CPA, and scGPT on relevant single-cell generation and perturbation benchmarks.
  • State-of-the-art on Replogle: Achieves notably strong results on the widely used Replogle genome-scale Perturb-seq dataset.

#Technical Details

DCM is a discrete-diffusion generative model for single-cell gene expression that operates in the native count space, avoiding the continuous relaxations used by many prior methods. It supports both unconditional generation and conditional generation given covariates such as cell type and perturbation identity, enabling simulation of cell-type-specific responses to genetic perturbations. On benchmarks, the authors report that DCM exceeds the performance of scVI, CPA, and scGPT, with particularly strong results on the Replogle Perturb-seq dataset. As a February 2026 bioRxiv preprint (v2), full architectural details — parameter count, the discrete-diffusion noise schedule, and the exact training corpus — await the complete release, and code and trained weights have not yet been published.

#Applications

DCM targets single-cell and functional-genomics researchers who want to model and simulate transcriptional states. Its conditional generation can produce synthetic cells for specified cell types and perturbations, supporting in-silico perturbation screens, data augmentation for rare cell states, and benchmarking of downstream analysis methods. By predicting perturbation responses, it can help prioritize genetic targets and interpret Perturb-seq experiments without exhaustively assaying every condition.

#Impact

DCM contributes to a growing line of work arguing that generative models should respect the discrete, count-based structure of single-cell data rather than forcing it into continuous representations. By reporting gains over strong baselines (scVI, CPA, scGPT) and state-of-the-art results on Replogle, it positions discrete diffusion as a competitive approach for single-cell generation and perturbation prediction. As a recent preprint without released code or weights, its results await peer review and independent reproduction, but the discrete-domain framing is a noteworthy direction for single-cell generative modeling.

Tags

gene_expressionperturbation_predictiondata_generationdiffusiongenerativefoundation_modeltranscriptomics