Arc Institute / Stanford University
Single-cell foundation model using tabular attention over context cells to enable zero-shot representation and in-context prediction of arbitrary perturbations.
STACK is a single-cell foundation model from Stanford University that reframes how transcriptomic representations are learned by attending across context cells rather than treating each cell in isolation. Most single-cell foundation models embed a cell purely from its own expression vector; STACK instead uses tabular attention to let a target cell's representation be informed by a surrounding set of reference cells, mirroring how in-context learning works in large language models. Released as a bioRxiv preprint in January 2026, it was trained on 149 million uniformly preprocessed human single cells.
The central problem STACK addresses is the brittleness of perturbation modeling. Predicting how a cell responds to a chemical compound, a genetic edit, or a donor background usually requires task-specific fine-tuning on labeled response data. STACK performs these predictions in context — conditioning on a few example cells at inference time — so that arbitrary, previously unseen perturbations can be handled without updating model weights. The authors report that this zero-shot, in-context behavior matches or exceeds baselines that were explicitly fine-tuned for the task.
Beyond the model itself, STACK was used to construct Perturb Sapiens, described as the first human whole-organism perturbed cell atlas, spanning 28 tissues, 40 cell classes, and 201 perturbations, with predictions checked against in-vitro experiments.
STACK is a transformer-based foundation model trained self-supervised on 149 million human single cells that were uniformly preprocessed to reduce platform and pipeline heterogeneity. Its defining architectural choice is tabular attention over context cells: rather than embedding a cell from its own features alone, the model attends across a tabular set of cells so that representations and downstream predictions are informed by neighboring observations. This design is what enables in-context learning, allowing the model to generalize to perturbation types absent from its training objective without parameter updates. The authors benchmark zero-shot performance against fine-tuned baselines and report comparable or superior results, and demonstrate the approach at scale by generating the Perturb Sapiens atlas (28 tissues, 40 cell classes, 201 perturbations) with in-vitro experimental confirmation.
STACK is aimed at computational biologists and experimentalists who need to anticipate cellular responses to interventions without running — or before running — costly screens. Because perturbation effects are predicted in context from a few examples, researchers can explore chemical and genetic perturbation hypotheses across many cell types and tissues, prioritize candidates for validation, and build perturbation atlases such as Perturb Sapiens. The context-aware embeddings are also broadly useful for standard single-cell tasks like cell-type representation and integration across the uniformly preprocessed corpus the model was trained on.
STACK contributes a distinct architectural direction to the single-cell foundation
model landscape: in-context learning via attention over reference cells, transferred
from the language-model paradigm to tabular transcriptomics. Its construction of
Perturb Sapiens, positioned as the first human whole-organism perturbed cell atlas with
in-vitro support, is a notable demonstration of generative perturbation modeling at
organism scale. The work shares an author (Yusuf Roohani) with GEARS, a separate
graph-based perturbation-prediction model already cataloged on bio.rodeo, though STACK
is a distinct model. Code is released by the Arc Institute (ArcInstitute/stack) and
model weights are available on Hugging Face (arcinstitute/Stack-Large), though both
carry non-commercial terms (code under CC BY-NC-SA, weights under a custom Arc
non-commercial license) that constrain commercial reuse and redistribution.