Vitaura / Institute of Zoology, Chinese Academy of Sciences / University of Chinese Academy of Sciences
A 12B-parameter multi-view single-cell foundation model that aligns expression and perception views via a joint-embedding (LLM-JEPA) objective.
CellOS is a multi-view single-cell foundation model developed by Vitaura (Beijing) Technology Co., Ltd. together with the Institute of Zoology, Chinese Academy of Sciences (State Key Laboratory of Organ Regeneration and Reconstruction) and collaborators, released as a bioRxiv preprint in June 2026. It targets a recurring weakness of single-cell foundation models: representations learned purely by reconstructing gene expression tend to capture surface-level transcriptional patterns rather than the latent biological state that drives them. CellOS reframes the problem as learning a "world model" of cellular state.
The model's central idea is to learn from two complementary views of a cell. An expression view encodes the measured transcriptome as a cell sentence, while a perception view provides a higher-level summary of cellular context. Rather than forcing the model to reconstruct raw expression for the second view, CellOS aligns the two in latent space using a joint-embedding predictive objective adapted from the LLM-JEPA framework. Predicting in representation space — instead of input space — encourages the model to encode abstract, function-relevant features of cell identity and state.
CellOS sits alongside single-cell foundation models such as Geneformer, scGPT, scFoundation, and UCE, but is distinguished by its joint-embedding predictive training, its multi-view formulation, and its scale: a 12-billion-parameter mixture-of-experts (MoE) model trained on 390.5 million single-cell transcriptomes. The authors report that it outperforms prior state-of-the-art single-cell foundation models on cell-state annotation, batch integration, and perturbation-response prediction.
CellOS is a transformer-based mixture-of-experts model trained in three stages. It first learns through causal cell-sentence language modeling, treating each cell's ranked or tokenized expression profile as a sentence. A function-preserving expansion then converts the dense backbone into a sparse MoE model, growing parameter count without discarding learned representations. Finally, the expression and perception views are aligned in latent space via an LLM-JEPA-style joint-embedding predictive objective, so the model predicts representations rather than raw inputs. The released configuration is a 12-billion-parameter MoE model pretrained on 390.5 million single-cell transcriptomes. Across evaluation, the authors report that CellOS surpasses prior state-of-the-art single-cell foundation models on cell-state annotation, batch integration, and perturbation-response prediction.
CellOS is aimed at computational biologists and single-cell researchers who need robust, transferable cellular representations. Its cell-state annotation performance supports automated labeling of cell types and states across atlases, while strong batch integration helps merge datasets generated across labs, platforms, and conditions into a shared embedding. Perturbation-response prediction makes the model relevant to in silico screening — anticipating how cells respond to genetic or chemical interventions — which is valuable for target discovery and mechanistic studies in regeneration and disease biology, areas central to its developing institutions.
CellOS is an early demonstration that joint-embedding predictive learning, widely explored in vision and language, can be adapted to single-cell transcriptomics at scale. By predicting in latent space across complementary views and combining this with a function-preserving dense-to-MoE expansion, it offers a recipe for building large single-cell models that emphasize abstract cell-state representations over input reconstruction. As a June 2026 preprint, its reported state-of-the-art results await peer review and independent benchmarking, and at the time of writing no public code or model weights have been released, which currently limits external reproduction and downstream adoption.
Zhou, Q., et al. (2026) CellOS: Learning a World Model of Cellular State through Joint Embedding Prediction. bioRxiv.
DOI: 10.64898/2026.06.18.733163Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data