Nanjing University / OpenBMB / Tsinghua University
A tri-modal foundation model unifying histology images, spatial transcriptomics, and biological language for zero-shot spatial biology and pathology reasoning.
SciCore-Omics is a tri-modal foundation model that unifies histology images, spatial transcriptomics, and biological language within a single architecture for spatial biology and pathology reasoning. Spatial biology sits at the intersection of two data types that have historically been modeled in isolation: hematoxylin-and-eosin (H&E) histology, which captures tissue morphology at high resolution, and spatial transcriptomics, which measures gene expression while preserving spatial context. SciCore-Omics treats these as complementary views of the same tissue and aligns both with natural language, so that a single model can reason across morphology, expression, and biomedical text.
Developed by researchers at Nanjing University together with the OpenBMB community and Zhiyuan Liu's THUNLP group at Tsinghua University, the model was released as a bioRxiv preprint on May 30, 2026. Its central design goal is zero-shot generalization: from one fixed, openly licensed checkpoint, SciCore-Omics performs histopathology classification, gene-expression prediction, and spatial-domain recognition without task-specific fine-tuning. This positions it alongside pathology foundation models and spatial-omics models, but distinguished by its joint treatment of all three modalities rather than image-text or expression-only pairing.
The model and training code are released under Apache-2.0, with weights on Hugging Face and an interactive demo Space, making it one of the more openly available tri-modal spatial biology models to date.
SciCore-Omics is an 8-billion-parameter multimodal model built on a MiniCPM-V-style vision-language backbone. The gene-aware branch couples a NicheFormer encoder with a Gene Q-Former and a Gene Projector to map spatial transcriptomic profiles into the language model's token space, letting morphology and expression be reasoned over jointly with text. Pretraining uses 151,182 spatially paired spots — locations where histology and transcriptomic measurements are co-registered — as the supervision signal for cross-modal alignment. The three-stage progressive pipeline comprises gene-bridge distillation, Swift-based continued pretraining and supervised fine-tuning, and a GSPO/PPO-style reinforcement-learning refinement stage. The released checkpoint, training code, and inference utilities are Apache-2.0 licensed.
SciCore-Omics targets researchers in computational pathology and spatial biology who need to interpret tissue across modalities. Because it operates zero-shot from a fixed checkpoint, it can classify histopathology images, predict gene expression from morphology, and recognize spatial domains without assembling labeled training sets for each task. Its natural-language interface supports interactive biomedical reasoning over H&E slides and spatial transcriptomics, useful for exploratory tissue analysis and hypothesis generation. The authors emphasize that it is released for research use only and should not serve as a standalone clinical diagnostic or treatment-recommendation system.
SciCore-Omics contributes an openly licensed, fully released tri-modal model to the rapidly growing space of pathology and spatial-omics foundation models, where most prior work has unified at most two modalities. By coupling histology, spatial transcriptomics, and language under Apache-2.0 weights with public training code and a live demo, it offers a reusable base for spatial biology research and a template for bridging gene expression into vision-language models. As a 2026 preprint its benchmark standing is still emerging, but its open release and zero-shot, multi-task design make it a notable reference point for cross-modal spatial biology.
Xiao, X., et al. (2026) SciCore-Omics: a tri-modal foundation model unifying histology, spatial transcriptomics and language for spatial biology. bioRxiv.
DOI: 10.64898/2026.05.30.728937Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data