An 860M-parameter generative single-cell foundation model that jointly represents and generates across epigenomic, transcriptomic, and proteomic modalities.
HoloCell is a generative single-cell foundation model developed by researchers at the Zhongguancun Academy in Beijing and released as a bioRxiv preprint in June 2026. It targets a central gap in single-cell biology: while multi-omics technologies can now profile the epigenome, transcriptome, and proteome within individual cells, most computational methods are built for a single modality or a fixed pair of modalities and depend on dataset-specific training or carefully paired measurements. HoloCell instead aims to model all three major omics layers within one unified framework.
The authors describe HoloCell as, to their knowledge, the first generative foundation model for joint representation learning and generative modeling across epigenomics, transcriptomics, and proteomics simultaneously. Rather than learning a separate embedding for each assay, it produces a shared representation of cellular state and can generate one modality conditioned on others, which lets it remain usable even when some modalities are missing — a common situation in real single-cell datasets.
Conceptually, HoloCell positions itself within the broader "virtual cell" research direction: the goal of building computational systems that both characterize a cell's state across molecular layers and simulate how information flows between them. It is a frozen, pretrained checkpoint evaluated zero-shot across diverse tasks rather than a task-specific model.
HoloCell contains over 860 million parameters and is pretrained on the Human-Multi-Omics-Corpus, a corpus the authors assembled from roughly 468 million single-cell profiles spanning epigenomic, transcriptomic, and proteomic layers, corresponding to over 425 billion tokens. The hierarchical tokenization scheme maps cis-regulatory elements, genes, and proteins onto structured tokens so that features from different assays share one modeling framework. Generation uses an iterative diffusion and remasking process, in which masked tokens are progressively resolved — an approach suited to the set-like, order-free structure of molecular feature lists. The authors report that HoloCell was evaluated across single-omics representation learning, paired multi-omics integration, unpaired multi-omics alignment, and cross-modal generation, and state that it shows superior performance and flexibility relative to existing methods across these tasks. Because this is a fresh preprint, detailed per-benchmark numbers should be read directly from the paper.
HoloCell is intended for computational and single-cell biologists working with multi-omics data, especially when modalities are incomplete or unpaired across cells. Its unified embedding can support cell-state characterization, cell-type annotation, and integration of datasets collected with different assays, while its generative side enables imputing or simulating an unmeasured modality — for example, predicting proteomic or epigenomic signal from transcriptomic input — to reason about how regulatory, transcriptional, and protein-level information relate within a cell.
By extending single-cell foundation models from primarily transcriptomic settings to a joint epigenomic-transcriptomic-proteomic framework with a generative component, HoloCell contributes to the emerging "virtual cell" agenda of systematically characterizing and simulating cellular systems. Its significance will depend on independent benchmarking and reproduction as the field evaluates the preprint. As of its June 2026 release, no public code repository, model weights, HuggingFace card, or API had been released, and there is no separate model card or data card beyond the preprint itself, which currently limits external validation and reuse.
Jiang, Q., et al. (2026) HoloCell: A Generative Foundation Model for Holistic Cellular Modeling. bioRxiv.
DOI: 10.64898/2026.06.07.730684Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data