A foundation model jointly pretrained on 67M single-cell and spatial transcriptomic profiles to model intra-cellular expression and inter-cellular spatial dependencies.
OmniCell is a transcriptomic foundation model developed by BGI Research (Shenzhen) and released as a preprint in December 2025. It addresses a structural gap in the single-cell foundation-model landscape: most existing models treat each cell in isolation, learning representations from gene-expression vectors while discarding the spatial context in which cells actually reside. Yet tissue function emerges from how cells are arranged and how they communicate with their neighbors. OmniCell is presented as the first foundation model to jointly model intra-cellular gene expression and inter-cellular spatial dependencies within a single unified architecture.
The model is pretrained on a corpus of 67 million combined single-cell and spatial transcriptomic profiles, spanning dissociated single-cell RNA sequencing data and spatially resolved transcriptomics. By learning from both data types together, OmniCell aims to capture not only the regulatory and co-expression structure inside individual cells but also the organizational logic of how cells of different types are positioned relative to one another in tissue. This dual objective is intended to produce representations that transfer across both dissociated and spatial assays.
OmniCell fits alongside single-cell foundation models such as scGPT, Geneformer, and UCE on the expression side, and spatial-transcriptomics methods on the tissue side, but distinguishes itself by unifying the two regimes rather than specializing in either. It targets zero-shot deployment across several downstream tasks without task-specific retraining.
OmniCell is a transcriptomic foundation model pretrained in a self-supervised fashion on 67 million single-cell and spatial transcriptomic profiles. Its defining design choice is the joint treatment of intra-cellular expression and inter-cellular spatial relationships, allowing it to serve as a shared backbone for both dissociated single-cell and spatially resolved data. The preprint reports zero-shot performance across cell-type deconvolution, spatial domain delineation, and gene co-expression reconstruction, positioning OmniCell as a general-purpose representation learner for transcriptomics rather than a single-task model. Detailed architecture specifications, parameter counts, and full benchmark tables are described in the preprint; precise figures should be confirmed against the published version as the work is peer reviewed.
OmniCell is intended for researchers working across single-cell and spatial transcriptomics who need a single pretrained backbone that operates in both regimes. Practical use cases include deconvolving spatial spots into cell-type composition, mapping spatial tissue domains in development and disease, and reconstructing co-expression networks from sparse data. Because the reported tasks are zero-shot, the model could lower the barrier for groups that lack the labeled data or compute needed to train task-specific models, particularly in spatial-omics settings where annotated references are scarce.
OmniCell stakes out a notable position as a foundation model that unifies dissociated and spatially resolved transcriptomics under one pretraining objective, a direction the field has been moving toward as spatial assays proliferate. Its real influence will depend on independent validation and broader adoption. A significant caveat is availability: at the time of writing, the preprint provides no public code or model weights, and it is released under an all-rights-reserved license, which constrains reproducibility and reuse until the authors release artifacts or relax the terms.