University of North Carolina at Chapel Hill / University of Texas at Austin / University of Pennsylvania
A cell world model pretrained on a 2.4M-cell mouse embryonic atlas that predicts one-step transcriptional state transitions and transfers to perturbation prediction.
Chreode is a "cell world model" that learns to predict how single cells move through transcriptional state space, both as they follow developmental signals and as they respond to genetic perturbations. Introduced as a 2026 preprint by researchers at the University of North Carolina at Chapel Hill, the University of Texas at Austin, and the University of Pennsylvania, it is framed as a contribution to the emerging "AI Virtual Cell" agenda: building models that can simulate cellular behavior the way physical world models simulate environments. The name evokes Waddington's idea of a chreode — a canalized developmental pathway that cells are guided along.
The central problem Chreode addresses is one-step temporal dynamics: given a cell's current transcriptional state, where does it go next? Rather than treating this as an unstructured regression or a generic diffusion sampling task, Chreode parameterizes the transition with a structured residual transition operator that decomposes cell-state change into three interpretable components — a downhill landscape flow toward attractor states, a rotational in-tangent term that captures cyclic or curved motion along the developmental manifold, and a stochastic spread term modeling biological noise. This decomposition is designed to mirror the geometry of Waddington-style developmental landscapes while remaining trainable at scale.
Chreode's second contribution is transfer: a model pretrained purely on unperturbed developmental dynamics generalizes to perturbation prediction through transfer learning, without redesigning the downstream training procedure. This positions it between trajectory-inference methods and perturbation-response models, suggesting that developmental dynamics and perturbation responses can share a common learned substrate.
Chreode couples a shared scVI encoder — a variational autoencoder widely used to denoise and embed scRNA-seq counts — with a diffusion-transformer (DiT) backbone that learns the residual transition operator in the latent space. Pretraining uses a 2.4M-cell mouse embryonic atlas drawn from seven datasets, exposing the model to a wide range of developmental cell states and transitions. On developmental benchmarks, Chreode reports reduced Sinkhorn (optimal-transport) distance between predicted and observed populations on hematopoiesis and islet differentiation tasks, improving over trajectory-prediction baselines such as PRESCIENT. For perturbation prediction, transferring the pretrained model to the Norman Perturb-seq dataset lowers the DE20 mean-squared error (computed over the top 20 differentially expressed genes) from 0.2121 to 0.1858 — about a 12.4% relative improvement — while leaving the downstream training recipe unchanged. The preprint states that the codebase is released under an open research license; pretrained weights are not released with this submission.
Chreode targets computational and developmental biologists who want to forecast how cell populations evolve and how they will respond to genetic interventions. Predicting one-step transcriptional dynamics supports developmental-trajectory and cell-fate analyses in systems such as hematopoiesis and pancreatic islet differentiation, while the perturbation-transfer capability is relevant to designing and interpreting Perturb-seq and CRISPR screens — for example, prioritizing perturbations or anticipating differential-expression responses before running an experiment. As a "virtual cell" component, it could serve as a dynamics module within larger simulation pipelines.
Chreode is a recent preprint, so its downstream influence and adoption are not yet established. Its main conceptual contribution is showing that a model pretrained only on unperturbed developmental dynamics can transfer to perturbation prediction, and that imposing an interpretable geometric structure (flow, rotation, and noise) on cell-state transitions can improve predictive accuracy over less-structured baselines such as PRESCIENT. Key limitations include reliance on a mouse embryonic training corpus, which may constrain generalization to human or adult tissues; the absence of released pretrained weights, which limits immediate reuse; and benchmarks reported on a focused set of differentiation and perturbation tasks rather than across a broad, standardized suite. As an unreviewed preprint, its results await independent validation.