bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

evoCancerGPT

Dana-Farber Cancer Institute

A decoder-only single-cell foundation model that forecasts future single-cell gene expression in cancer evolution from prior cell states, generalizing zero-shot to held-out cancers.

Released: February 2026

evoCancerGPT is a decoder-only single-cell foundation model that forecasts how cancer cells evolve, developed by researchers at the Dana-Farber Cancer Institute and posted to bioRxiv in February 2026. Most single-cell foundation models learn static representations of cell state for tasks like annotation or integration; evoCancerGPT instead targets temporal progression, generating future gene expression profiles from prior cell states. By framing cancer evolution as a generative sequence-modeling problem, it aims to predict where a tumor cell population is heading at the resolution of individual patients.

The model borrows the autoregressive, decoder-only architecture of generative pre-trained transformers (GPTs) from language modeling. For each cancer type, patient, and cell type, the authors construct "sentences" of cells ordered by inferred pseudotime, so that the model learns to predict the next cell state given the preceding trajectory. Each cell is represented as a token that integrates the continuous gene expression values across its genes, giving the model a rich, per-cell input rather than a discretized summary.

evoCancerGPT is trained via transfer learning across multiple cancers and evaluated for zero-shot generalization to held-out cancer types—predicting progression for cancers it was not trained on. This positions it as an attempt to learn transferable principles of single-cell cancer dynamics rather than a model fit narrowly to one disease, complementing static single-cell foundation models such as scGPT and Geneformer with an explicitly generative, temporal focus.

#Key Features

  • Decoder-only generative forecasting: evoCancerGPT uses a GPT-style autoregressive architecture to generate future single-cell gene expression from prior cell states, modeling cancer progression as next-state prediction.
  • Pseudotime-ordered cell sentences: Training sequences are built per cancer type, patient, and cell type, ordered by inferred pseudotime, so the model learns trajectories of cellular change.
  • Continuous-expression cell tokens: Each cell token integrates continuous gene expression across its genes, preserving quantitative signal rather than relying solely on coarse binning.
  • Zero-shot generalization to held-out cancers: Trained via transfer learning, the model is evaluated for its ability to forecast progression in cancer types absent from training.

#Technical Details

evoCancerGPT is a decoder-only (GPT-style) transformer foundation model for single-cell transcriptomics. Training data comprise 2.76 million cell tokens, each spanning 12,639 genes, drawn from 7 cancer types. For each cancer type, patient, and cell type, cells are arranged into ordered sequences using inferred pseudotime, and the model is trained autoregressively to predict subsequent cell states from earlier ones; each cell token integrates the cell's continuous gene expression values. The model is trained with a transfer-learning strategy and assessed for zero-shot generalization, generating single-cell and single-sample cancer progression for cancer types held out from training. As a recent preprint, full parameter counts and additional architectural hyperparameters are not summarized here, and the authors do not report publicly released weights at the time of posting.

#Applications

evoCancerGPT is aimed at cancer researchers and computational oncologists who want to anticipate how tumor cell populations may change over time. By forecasting future single-cell expression states from current ones, it could support hypotheses about disease progression, the emergence of resistant or aggressive cell states, and patient-level trajectory modeling. Its zero-shot capability is particularly relevant for cancers with limited longitudinal single-cell data, where a model trained on other cancers can still propose plausible progression dynamics, helping prioritize experiments or characterize evolution in under-studied tumor types.

#Impact

evoCancerGPT extends the single-cell foundation model paradigm from static representation learning toward generative, temporal forecasting of cancer evolution, a comparatively underexplored direction. Its emphasis on zero-shot generalization across cancer types speaks to the goal of learning transferable dynamics rather than disease-specific fits. As a February 2026 bioRxiv preprint without yet-reported released weights, its real-world adoption and independent validation are still pending; rigorous benchmarking—especially against ground-truth longitudinal data and simpler trajectory baselines—will be important to establish how reliably its forecasts capture genuine cancer progression.

Tags

gene_expressiontransformerfoundation_modelgenerativetransfer_learningzero_shotcancertranscriptomics