evoCancerGPT

Single-cell foundation model that forecasts how cancer cells evolve, autoregressively generating future gene expression from prior cell states.

Released: February 2026

evoCancerGPT is a decoder-only single-cell foundation model that forecasts how cancer cells evolve, developed by researchers at the Dana-Farber Cancer Institute and posted to bioRxiv in February 2026. Most single-cell foundation models learn static representations of cell state for tasks like annotation or integration; evoCancerGPT instead targets temporal progression, generating future gene expression profiles from prior cell states. By framing cancer evolution as a generative sequence-modeling problem, it aims to predict where a tumor cell population is heading at the resolution of individual patients.

The model borrows the autoregressive, decoder-only architecture of generative pre-trained transformers (GPTs) from language modeling. For each cancer type, patient, and cell type, the authors construct "sentences" of cells ordered by inferred pseudotime, so that the model learns to predict the next cell state given the preceding trajectory. Each cell is represented as a token that integrates the continuous gene expression values across its genes, giving the model a rich, per-cell input rather than a discretized summary.

evoCancerGPT is trained via transfer learning across multiple cancers and evaluated for zero-shot generalization to held-out cancer types—predicting progression for cancers it was not trained on. This positions it as an attempt to learn transferable principles of single-cell cancer dynamics rather than a model fit narrowly to one disease, complementing static single-cell foundation models such as scGPT and Geneformer with an explicitly generative, temporal focus.

Key Features

Decoder-only generative forecasting: evoCancerGPT uses a GPT-style autoregressive architecture to generate future single-cell gene expression from prior cell states, modeling cancer progression as next-state prediction.
Pseudotime-ordered cell sentences: Training sequences are built per cancer type, patient, and cell type, ordered by inferred pseudotime, so the model learns trajectories of cellular change.
Continuous-expression cell tokens: Each cell token integrates continuous gene expression across its genes, preserving quantitative signal rather than relying solely on coarse binning.
Zero-shot generalization to held-out cancers: Trained via transfer learning, the model is evaluated for its ability to forecast progression in cancer types absent from training.

Technical Details

evoCancerGPT is a decoder-only (GPT-style) transformer foundation model for single-cell transcriptomics. Training data comprise 2.76 million cell tokens, each spanning 12,639 genes, drawn from 7 cancer types. For each cancer type, patient, and cell type, cells are arranged into ordered sequences using inferred pseudotime, and the model is trained autoregressively to predict subsequent cell states from earlier ones; each cell token integrates the cell's continuous gene expression values. The model is trained with a transfer-learning strategy and assessed for zero-shot generalization, generating single-cell and single-sample cancer progression for cancer types held out from training. As a recent preprint, full parameter counts and additional architectural hyperparameters are not summarized here, and the authors do not report publicly released weights at the time of posting.

Applications

evoCancerGPT is aimed at cancer researchers and computational oncologists who want to anticipate how tumor cell populations may change over time. By forecasting future single-cell expression states from current ones, it could support hypotheses about disease progression, the emergence of resistant or aggressive cell states, and patient-level trajectory modeling. Its zero-shot capability is particularly relevant for cancers with limited longitudinal single-cell data, where a model trained on other cancers can still propose plausible progression dynamics, helping prioritize experiments or characterize evolution in under-studied tumor types.

Impact

evoCancerGPT extends the single-cell foundation model paradigm from static representation learning toward generative, temporal forecasting of cancer evolution, a comparatively underexplored direction. Its emphasis on zero-shot generalization across cancer types speaks to the goal of learning transferable dynamics rather than disease-specific fits. As a February 2026 bioRxiv preprint without yet-reported released weights, its real-world adoption and independent validation are still pending; rigorous benchmarking—especially against ground-truth longitudinal data and simpler trajectory baselines—will be important to establish how reliably its forecasts capture genuine cancer progression.

Citation

evoCancerGPT: Generating Zero-Shot Single-Cell and Single-Sample Cancer Progression Through Transfer Learning

Wang, X., et al. (2026) evoCancerGPT: Generating Zero-Shot Single-Cell and Single-Sample Cancer Progression Through Transfer Learning. bioRxiv.

DOI: 10.64898/2026.02.12.705621

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References25

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

11Closed

Usability — can I run it?7

Reproducibility — can I retrain it?13

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Decoder-only generative forecasting: evoCancerGPT uses a GPT-style autoregressive architecture to generate future single-cell gene expression from prior cell states, modeling cancer progression as next-state prediction.

Pseudotime-ordered cell sentences: Training sequences are built per cancer type, patient, and cell type, ordered by inferred pseudotime, so the model learns trajectories of cellular change.

Continuous-expression cell tokens: Each cell token integrates continuous gene expression across its genes, preserving quantitative signal rather than relying solely on coarse binning.

Zero-shot generalization to held-out cancers: Trained via transfer learning, the model is evaluated for its ability to forecast progression in cancer types absent from training.

Technical Details

Applications

Impact

evoCancerGPT

Key Features

Technical Details

Applications

Impact

Citation

evoCancerGPT: Generating Zero-Shot Single-Cell and Single-Sample Cancer Progression Through Transfer Learning

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

evoCancerGPT

Key Features

Technical Details

Applications

Impact

Citation

evoCancerGPT: Generating Zero-Shot Single-Cell and Single-Sample Cancer Progression Through Transfer Learning

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

evoCancerGPT

#Key Features

#Technical Details

#Applications

#Impact

Citation

evoCancerGPT: Generating Zero-Shot Single-Cell and Single-Sample Cancer Progression Through Transfer Learning

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

evoCancerGPT

#Key Features

#Technical Details

#Applications

#Impact

Citation

evoCancerGPT: Generating Zero-Shot Single-Cell and Single-Sample Cancer Progression Through Transfer Learning

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact