A hierarchical multimodal foundation model integrating spatial transcriptomics and H&E histology for biological discovery and platform-agnostic clinical prediction.
STORM (Spatial Transcriptomics and histOlogy Representation Model) is a multimodal foundation model that integrates spatially resolved gene expression with H&E histology to learn joint molecular-morphological representations of tissue. By combining morphological features, gene expression, and spatial context in a single hierarchical model, STORM bridges imaging and omics, producing representations that transfer across tasks ranging from spatial domain discovery to clinical outcome prediction.
The model targets a central challenge in spatial biology: spatial transcriptomics platforms differ widely in resolution and chemistry, and matched molecular and morphological readouts are often analyzed separately. STORM is platform-agnostic, performing consistently across Visium, Xenium, Visium HD, and CosMx, and is designed to generalize to new cohorts without re-training. This places it among the emerging class of spatial-omics foundation models, distinguished by its explicit coupling of histology with spatially resolved transcriptomics at scale.
STORM was developed by the Ruijiang Li lab at Stanford University, with first author Jinxi Xiang, and released as an arXiv preprint in April 2026. It is a companion to MuPD, a generative diffusion-transformer model from the same group, with STORM providing the representation-learning counterpart to the spatial-transcriptomics-and-histology problem.
STORM is a hierarchical foundation model pretrained on approximately 1.2 million spatially resolved transcriptomic profiles with matched histology spanning 18 organs. The architecture jointly encodes morphological features from H&E imaging, gene expression, and spatial context to learn robust molecular-morphological representations. On spatial gene expression prediction from H&E, STORM outperforms existing methods across 11 tumor types, and its platform-agnostic design yields consistent performance across Visium, Xenium, Visium HD, and CosMx. For clinical evaluation, the model was applied to 23 independent cohorts comprising 7,245 patients without re-training, where it significantly improved immunotherapy response prediction and prognostication relative to established biomarkers.
STORM supports spatial biology and computational pathology research as well as translational and clinical applications. Researchers can use it to predict spatial gene expression from routine H&E slides, discover spatial domains and tissue architecture, and generate molecular-morphological representations transferable across platforms and cohorts. Its demonstrated clinical utility — improving immunotherapy response and outcome prediction across thousands of patients without re-training — makes it relevant for precision oncology workflows where spatial transcriptomics is unavailable but archival histology exists.
STORM demonstrates that a single platform-agnostic foundation model can unify spatial transcriptomics and histology and generalize across spatial platforms and dozens of clinical cohorts without re-training, addressing the fragmentation that has limited cross-study reuse in spatial biology. By improving spatial gene expression prediction, spatial domain discovery, and clinical prediction over established biomarkers, it offers a scalable framework for spatially informed discovery and precision medicine. As a recently released arXiv preprint, its claims await peer review and independent validation, and code and model weights were not yet public at release, but it represents a notable step toward foundation models that connect molecular and morphological readouts at clinical scale.