CellOS is a multi-view single-cell foundation model developed by Vitaura (Beijing) Technology Co., Ltd. together with the Institute of Zoology, Chinese Academy of Sciences (State Key Laboratory of Organ Regeneration and Reconstruction) and collaborators, released as a bioRxiv preprint in June 2026. It targets a recurring weakness of single-cell foundation models: representations learned purely by reconstructing gene expression tend to capture surface-level transcriptional patterns rather than the latent biological state that drives them. CellOS reframes the problem as learning a "world model" of cellular state.

The model's central idea is to learn from two complementary views of a cell. An expression view encodes the measured transcriptome as a cell sentence, while a perception view provides a higher-level summary of cellular context. Rather than forcing the model to reconstruct raw expression for the second view, CellOS aligns the two in latent space using a joint-embedding predictive objective adapted from the LLM-JEPA framework. Predicting in representation space — instead of input space — encourages the model to encode abstract, function-relevant features of cell identity and state.

CellOS sits alongside single-cell foundation models such as Geneformer, scGPT, scFoundation, and UCE, but is distinguished by its joint-embedding predictive training, its multi-view formulation, and its scale: a 12-billion-parameter mixture-of-experts (MoE) model trained on 390.5 million single-cell transcriptomes. The authors report that it outperforms prior state-of-the-art single-cell foundation models on cell-state annotation, batch integration, and perturbation-response prediction.

Key Features

Multi-view representation learning: CellOS jointly models an expression view (the transcriptome as a cell sentence) and a perception view, learning cellular representations that capture more than reconstructed gene counts.
LLM-JEPA latent alignment: Instead of reconstructing inputs, the two views are aligned in latent space with a joint-embedding predictive objective, pushing the model toward abstract, function-preserving features of cell state.
Three-stage training strategy: Pretraining proceeds through causal cell-sentence language modeling, a function-preserving dense-to-MoE expansion, and latent-space alignment, scaling capacity while preserving learned function.
Mixture-of-experts at 12B parameters: A sparse MoE architecture expands model capacity efficiently, scaling to 12 billion parameters trained on 390.5 million single-cell transcriptomes.
Broad benchmark coverage: The authors report state-of-the-art results across three distinct task families — cell-state annotation, batch integration, and perturbation-response prediction.

Technical Details

CellOS is a transformer-based mixture-of-experts model trained in three stages. It first learns through causal cell-sentence language modeling, treating each cell's ranked or tokenized expression profile as a sentence. A function-preserving expansion then converts the dense backbone into a sparse MoE model, growing parameter count without discarding learned representations. Finally, the expression and perception views are aligned in latent space via an LLM-JEPA-style joint-embedding predictive objective, so the model predicts representations rather than raw inputs. The released configuration is a 12-billion-parameter MoE model pretrained on 390.5 million single-cell transcriptomes. Across evaluation, the authors report that CellOS surpasses prior state-of-the-art single-cell foundation models on cell-state annotation, batch integration, and perturbation-response prediction.

Applications

CellOS is aimed at computational biologists and single-cell researchers who need robust, transferable cellular representations. Its cell-state annotation performance supports automated labeling of cell types and states across atlases, while strong batch integration helps merge datasets generated across labs, platforms, and conditions into a shared embedding. Perturbation-response prediction makes the model relevant to in silico screening — anticipating how cells respond to genetic or chemical interventions — which is valuable for target discovery and mechanistic studies in regeneration and disease biology, areas central to its developing institutions.

Impact

CellOS is an early demonstration that joint-embedding predictive learning, widely explored in vision and language, can be adapted to single-cell transcriptomics at scale. By predicting in latent space across complementary views and combining this with a function-preserving dense-to-MoE expansion, it offers a recipe for building large single-cell models that emphasize abstract cell-state representations over input reconstruction. As a June 2026 preprint, its reported state-of-the-art results await peer review and independent benchmarking, and at the time of writing no public code or model weights have been released, which currently limits external reproduction and downstream adoption.

Key Features

Multi-view representation learning: CellOS jointly models an expression view (the transcriptome as a cell sentence) and a perception view, learning cellular representations that capture more than reconstructed gene counts.

LLM-JEPA latent alignment: Instead of reconstructing inputs, the two views are aligned in latent space with a joint-embedding predictive objective, pushing the model toward abstract, function-preserving features of cell state.

Three-stage training strategy: Pretraining proceeds through causal cell-sentence language modeling, a function-preserving dense-to-MoE expansion, and latent-space alignment, scaling capacity while preserving learned function.

Mixture-of-experts at 12B parameters: A sparse MoE architecture expands model capacity efficiently, scaling to 12 billion parameters trained on 390.5 million single-cell transcriptomes.

Broad benchmark coverage: The authors report state-of-the-art results across three distinct task families — cell-state annotation, batch integration, and perturbation-response prediction.

Technical Details

Applications

Impact

CellOS

Key Features

Technical Details

Applications

Impact

Citation

CellOS: Learning a World Model of Cellular State through Joint Embedding Prediction

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

CellOS

Key Features

Technical Details

Applications

Impact

Citation

CellOS: Learning a World Model of Cellular State through Joint Embedding Prediction

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

CellOS

#Key Features

#Technical Details

#Applications

#Impact

Citation

CellOS: Learning a World Model of Cellular State through Joint Embedding Prediction

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

CellOS

#Key Features

#Technical Details

#Applications

#Impact

Citation

CellOS: Learning a World Model of Cellular State through Joint Embedding Prediction

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact