Generative single-cell foundation model trained on 100M+ transcriptomes that predicts how genetic perturbations reshape cellular trajectories over time.
PerturbGen is a generative single-cell foundation model that predicts how genetic perturbations reshape cellular trajectories over time. Developed by Kevin Chi Hao Ly, Mo Lotfollahi, Berthold Göttgens, and collaborators at the Wellcome Sanger Institute and partner groups, it was posted to bioRxiv in March 2026. Unlike most in-silico perturbation methods, which predict a cell's immediate response at a single state, PerturbGen models the dynamics of cell-state transitions—how a perturbation applied at an early source state propagates to reconfigure later downstream states.
Predicting how cells move between states, and how interventions disrupt those trajectories, is central to understanding differentiation, disease progression, and cellular reprogramming. Existing approaches can estimate single-cell perturbation responses but generally cannot forecast effects across a dynamic trajectory—for example, how an early genetic change alters cell fates much later. PerturbGen is trained on over 100 million single-cell transcriptomes to learn these temporal dynamics, then applied to predict how perturbations alter gene programs and trajectories across processes such as differentiation and disease progression.
PerturbGen is a generative foundation model pretrained on more than 100 million single-cell transcriptomes, learning representations that support state prediction, gene program discovery, and in-silico perturbation. The released implementation is primarily Python and is distributed under the MIT license, with pretrained weights available on Hugging Face and full documentation—including notebooks for preprocessing, tokenization, training, gene-embedding extraction, gene-program analysis, and perturbation simulation—hosted at the Sanger documentation site. The authors validate the model on three newly generated multi-condition human single-cell datasets spanning immune responses, hematopoiesis, and skin development, demonstrating prediction of how early perturbations reconfigure later cell states.
PerturbGen is aimed at researchers studying developmental biology, hematopoiesis, immunology, and disease progression who need to anticipate the downstream consequences of genetic perturbations across time rather than at a single snapshot. Concrete uses include prioritizing perturbation experiments in silico, identifying context-specific gene programs that govern cell-fate decisions, and exploring candidate interventions that might steer cells toward desired states or reverse pathological trajectories—reducing the experimental search space before wet-lab validation.
By extending single-cell foundation models from static perturbation response to dynamic trajectory prediction, PerturbGen addresses a capability gap in computational perturbation biology. Coming from the Lotfollahi and Göttgens groups at the Wellcome Sanger Institute, with open code, pretrained weights, and documentation, it is positioned for community uptake in differentiation and disease-dynamics research. As a recent preprint, head-to-head benchmarking against other perturbation foundation models and validation of its trajectory predictions in independent settings remain ongoing.