bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

PerturbDiff

Mila

Functional diffusion model that predicts single-cell perturbation responses by generating over distributions embedded in a Hilbert space, capturing population-level response variability.

Released: February 2026

PerturbDiff is a generative model for predicting how single cells respond to perturbations such as genetic knockouts or drug treatments—a core task in building "virtual cell" simulators. A fundamental obstacle is that high-throughput single-cell sequencing is destructive: a given cell cannot be measured both before and after a perturbation, so models must learn to map between unpaired control and perturbed cell populations rather than between matched individual cells.

Developed by researchers at Mila (in Jian Tang's group) and released as a February 2026 arXiv preprint, PerturbDiff reframes the problem at the level of distributions rather than individual cells. Existing methods typically assume a single fixed response distribution for a given cellular context and perturbation, but real responses vary systematically because of unobserved latent factors such as microenvironmental fluctuations and batch effects—forming a manifold of possible response distributions for the same nominal conditions.

To capture this variability, PerturbDiff embeds entire distributions as points in a Hilbert space and defines a diffusion-based generative process that operates directly over probability distributions, allowing it to model population-level response shifts driven by hidden factors.

#Key Features

  • Distribution-level modeling: Shifts the modeling unit from individual cells to whole response distributions, matching the unpaired nature of single-cell perturbation data.
  • Functional diffusion in Hilbert space: Embeds distributions as points in a Hilbert space and runs a diffusion generative process directly over those distributions.
  • Captures latent variability: Represents a manifold of possible response distributions, accounting for unobservable factors like microenvironment and batch effects.
  • Strong generalization: Reported to generalize substantially better to unseen perturbations than prior distribution-mapping approaches.

#Technical Details

PerturbDiff is a diffusion model that operates over probability distributions rather than over individual data points. By embedding each control or perturbed cell population as a point in a Hilbert space, it defines a "functional" diffusion process whose samples are distributions, conditioned on cellular context and perturbation type. This lets the model represent population-level response shifts arising from latent factors instead of collapsing them to a single mean response. The authors benchmark PerturbDiff on established single-cell perturbation datasets and report state-of-the-art performance on single-cell response prediction, with notably improved generalization to perturbations not seen during training.

#Applications

PerturbDiff supports in silico perturbation screening and virtual-cell modeling, where predicting transcriptional responses to genetic or chemical perturbations can prioritize experiments and reduce wet-lab cost. It is most relevant to systems biologists and drug-discovery researchers working with large perturbation atlases, where accurate prediction for unseen perturbations and realistic modeling of population-level variability are key to extrapolating beyond measured conditions.

#Impact

By treating perturbation response as a generative problem over distributions, PerturbDiff offers a conceptually distinct approach to the virtual-cell challenge and reports improved generalization to unseen perturbations on standard benchmarks. As a recent preprint, its results await peer review and broader independent evaluation, and—like other perturbation-prediction methods—its real-world utility will depend on how well distribution-level gains translate to downstream biological discovery.

Tags

perturbation_predictiongene_expressiondiffusiongenerativetranscriptomicsperturbation