bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cellDNA & Gene

HoloCell

Beijing Zhongguancun Academy

An 860M-parameter generative single-cell foundation model that jointly represents and generates across epigenomic, transcriptomic, and proteomic modalities.

Released: June 2026
Parameters: 860 Million

HoloCell is a generative single-cell foundation model developed by researchers at the Zhongguancun Academy in Beijing and released as a bioRxiv preprint in June 2026. It targets a central gap in single-cell biology: while multi-omics technologies can now profile the epigenome, transcriptome, and proteome within individual cells, most computational methods are built for a single modality or a fixed pair of modalities and depend on dataset-specific training or carefully paired measurements. HoloCell instead aims to model all three major omics layers within one unified framework.

The authors describe HoloCell as, to their knowledge, the first generative foundation model for joint representation learning and generative modeling across epigenomics, transcriptomics, and proteomics simultaneously. Rather than learning a separate embedding for each assay, it produces a shared representation of cellular state and can generate one modality conditioned on others, which lets it remain usable even when some modalities are missing — a common situation in real single-cell datasets.

Conceptually, HoloCell positions itself within the broader "virtual cell" research direction: the goal of building computational systems that both characterize a cell's state across molecular layers and simulate how information flows between them. It is a frozen, pretrained checkpoint evaluated zero-shot across diverse tasks rather than a task-specific model.

#Key Features

  • Tri-omic coverage: Jointly models epigenomic, transcriptomic, and proteomic data in a single architecture, rather than being specialized to one modality or a single modality pair.
  • Hierarchical tokenization: Encodes cis-regulatory elements, genes, and proteins as structured tokens within a shared vocabulary, giving the model a biologically grounded way to represent features from each omics layer.
  • Diffusion-based cross-modal generation: Generates missing modalities through an iterative diffusion-and-remasking procedure that respects the inherently unordered nature of biological features, enabling in silico simulation of multi-omics information flow.
  • Robust to modality missingness: Supports paired multi-omics integration, unpaired multi-omics alignment, and single-omics representation without requiring complete paired measurements for every cell.
  • Frozen foundation model: Evaluated zero-shot from a fixed pretrained checkpoint across representation and generation tasks, reflecting a general-purpose rather than task-tuned design.

#Technical Details

HoloCell contains over 860 million parameters and is pretrained on the Human-Multi-Omics-Corpus, a corpus the authors assembled from roughly 468 million single-cell profiles spanning epigenomic, transcriptomic, and proteomic layers, corresponding to over 425 billion tokens. The hierarchical tokenization scheme maps cis-regulatory elements, genes, and proteins onto structured tokens so that features from different assays share one modeling framework. Generation uses an iterative diffusion and remasking process, in which masked tokens are progressively resolved — an approach suited to the set-like, order-free structure of molecular feature lists. The authors report that HoloCell was evaluated across single-omics representation learning, paired multi-omics integration, unpaired multi-omics alignment, and cross-modal generation, and state that it shows superior performance and flexibility relative to existing methods across these tasks. Because this is a fresh preprint, detailed per-benchmark numbers should be read directly from the paper.

#Applications

HoloCell is intended for computational and single-cell biologists working with multi-omics data, especially when modalities are incomplete or unpaired across cells. Its unified embedding can support cell-state characterization, cell-type annotation, and integration of datasets collected with different assays, while its generative side enables imputing or simulating an unmeasured modality — for example, predicting proteomic or epigenomic signal from transcriptomic input — to reason about how regulatory, transcriptional, and protein-level information relate within a cell.

#Impact

By extending single-cell foundation models from primarily transcriptomic settings to a joint epigenomic-transcriptomic-proteomic framework with a generative component, HoloCell contributes to the emerging "virtual cell" agenda of systematically characterizing and simulating cellular systems. Its significance will depend on independent benchmarking and reproduction as the field evaluates the preprint. As of its June 2026 release, no public code repository, model weights, HuggingFace card, or API had been released, and there is no separate model card or data card beyond the preprint itself, which currently limits external validation and reuse.

Citation

HoloCell: A Generative Foundation Model for Holistic Cellular Modeling

Jiang, Q., et al. (2026) HoloCell: A Generative Foundation Model for Holistic Cellular Modeling. bioRxiv.

DOI: 10.64898/2026.06.07.730684

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References59

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
21Closed
Usability — can I run it?15
Reproducibility — can I retrain it?13
Model Openness Framework
Unclassified
Missing required components

Tags

multi_omics_integrationcross_modal_generationrepresentation_learningtransformerdiffusionfoundation_modelgenerativeself_supervisedepigenomicsproteomics

Resources

Research Paper