Functional diffusion model that predicts single-cell perturbation responses by generating over distributions embedded in a Hilbert space, capturing population-level response variability.
PerturbDiff is a generative model for predicting how single cells respond to perturbations such as genetic knockouts or drug treatments—a core task in building "virtual cell" simulators. A fundamental obstacle is that high-throughput single-cell sequencing is destructive: a given cell cannot be measured both before and after a perturbation, so models must learn to map between unpaired control and perturbed cell populations rather than between matched individual cells.
Developed by researchers at Mila (in Jian Tang's group) and released as a February 2026 arXiv preprint, PerturbDiff reframes the problem at the level of distributions rather than individual cells. Existing methods typically assume a single fixed response distribution for a given cellular context and perturbation, but real responses vary systematically because of unobserved latent factors such as microenvironmental fluctuations and batch effects—forming a manifold of possible response distributions for the same nominal conditions.
To capture this variability, PerturbDiff embeds entire distributions as points in a Hilbert space and defines a diffusion-based generative process that operates directly over probability distributions, allowing it to model population-level response shifts driven by hidden factors.
PerturbDiff is a diffusion model that operates over probability distributions rather than over individual data points. By embedding each control or perturbed cell population as a point in a Hilbert space, it defines a "functional" diffusion process whose samples are distributions, conditioned on cellular context and perturbation type. This lets the model represent population-level response shifts arising from latent factors instead of collapsing them to a single mean response. The authors benchmark PerturbDiff on established single-cell perturbation datasets and report state-of-the-art performance on single-cell response prediction, with notably improved generalization to perturbations not seen during training.
PerturbDiff supports in silico perturbation screening and virtual-cell modeling, where predicting transcriptional responses to genetic or chemical perturbations can prioritize experiments and reduce wet-lab cost. It is most relevant to systems biologists and drug-discovery researchers working with large perturbation atlases, where accurate prediction for unseen perturbations and realistic modeling of population-level variability are key to extrapolating beyond measured conditions.
By treating perturbation response as a generative problem over distributions, PerturbDiff offers a conceptually distinct approach to the virtual-cell challenge and reports improved generalization to unseen perturbations on standard benchmarks. As a recent preprint, its results await peer review and broader independent evaluation, and—like other perturbation-prediction methods—its real-world utility will depend on how well distribution-level gains translate to downstream biological discovery.