SCALE

Virtual cell foundation model predicting single-cell responses to genetic, chemical, and cytokine perturbations with conditional flow matching.

Released: March 2026

SCALE (Scalable Conditional Atlas-Level Endpoint transport) is a virtual cell foundation model for predicting how single cells respond to genetic, chemical, and cytokine perturbations directly from single-cell measurements. Released as a March 2026 bioRxiv preprint by researchers at the Shanghai Artificial Intelligence Laboratory and collaborators, it targets the goal of in silico experimentation—simulating perturbation outcomes that would otherwise require costly wet-lab screens.

The model addresses three persistent obstacles in virtual cell modeling: inefficient training and inference pipelines, unstable behavior when modeling sparse high-dimensional single-cell space, and evaluation protocols that reward reconstruction fidelity over biological accuracy. SCALE reframes perturbation prediction as an endpoint-oriented optimal-transport problem, jointly learning set-level cell-population representations and perturbation-conditioned state transitions rather than modeling individual cells in isolation.

By combining a LLaMA-style set encoder with conditional flow matching and a BioNeMo-based systems backbone, SCALE positions itself among recent large perturbation models such as STATE and Tahoe-trained foundation models, emphasizing both predictive accuracy and the engineering efficiency needed to train on atlas-scale data.

Key Features

Set-aware population modeling: Rather than predicting per-cell responses independently, SCALE learns set-level representations of cell populations, capturing perturbation-induced shifts in sparse, high-dimensional single-cell space.
Endpoint-oriented flow matching: A conditional flow-matching objective models perturbation as transport between control and perturbed cell-state endpoints, improving stability over reconstruction-centric approaches.
LLaMA-based cellular encoding: A LLaMA-style encoder provides the representational backbone for cellular state, adapting a proven language-model architecture to transcriptomic data.
Efficient systems backbone: A BioNeMo-based training and inference framework delivers a 12.51x pretraining speedup and 1.29x inference speedup over the prior state-of-the-art pipeline under matched system settings.

Technical Details

SCALE instantiates an end-to-end formulation that jointly learns set-level representations and perturbation-conditioned state transitions, pairing a LLaMA-style set encoder with a conditional flow-matching architecture for stable transport-based prediction. Training and inference run on a BioNeMo-based framework that improves data throughput, distributed scalability, and deployment efficiency. The model is evaluated on the Tahoe-100M giga-scale single-cell perturbation atlas using a cell-level protocol centered on biologically meaningful metrics, where it improves perturbation-discrimination correlation (PDCorr) by 12.02% and differential-expression overlap by 10.66% over STATE, alongside the reported 12.51x pretraining and 1.29x inference speedups. Parameter count is not disclosed in the preprint.

Applications

SCALE is aimed at computational and experimental biologists who use virtual cell models to prioritize perturbations before committing to laboratory screens. By predicting population-level responses to genetic, chemical, or cytokine perturbations, it can support target discovery, drug-response forecasting, and hypothesis generation in single-cell pharmacology, while its efficient training pipeline makes atlas-scale modeling more accessible to groups with constrained compute.

Impact

SCALE contributes to the rapidly growing class of perturbation-trained virtual cell models by coupling a transport-based formulation with a production-grade systems backbone, demonstrating measurable gains on the Tahoe-100M benchmark over STATE. Its emphasis on biologically meaningful evaluation and large training-throughput speedups highlights a broader shift toward models that are both accurate and practical at atlas scale. As a recent preprint without released code or weights, its downstream adoption remains to be established.

Citation

SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

Chen, S., et al. (2026) SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction. bioRxiv.

DOI: 10.64898/2026.03.17.712536

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References39

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

19Closed

Usability — can I run it?14

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Set-aware population modeling: Rather than predicting per-cell responses independently, SCALE learns set-level representations of cell populations, capturing perturbation-induced shifts in sparse, high-dimensional single-cell space.

Endpoint-oriented flow matching: A conditional flow-matching objective models perturbation as transport between control and perturbed cell-state endpoints, improving stability over reconstruction-centric approaches.

LLaMA-based cellular encoding: A LLaMA-style encoder provides the representational backbone for cellular state, adapting a proven language-model architecture to transcriptomic data.

Efficient systems backbone: A BioNeMo-based training and inference framework delivers a 12.51x pretraining speedup and 1.29x inference speedup over the prior state-of-the-art pipeline under matched system settings.

Technical Details

Applications

Impact

SCALE

Key Features

Technical Details

Applications

Impact

Citation

SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

SCALE

Key Features

Technical Details

Applications

Impact

Citation

SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

SCALE

#Key Features

#Technical Details

#Applications

#Impact

Citation

SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

SCALE

#Key Features

#Technical Details

#Applications

#Impact

Citation

SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact