Multimodal foundation model pretrained on 1.76 billion paired histology-spatial transcriptomics spots, linking whole-slide images to spatial molecular programs.
SQUALL is a multimodal foundation model that bridges whole-slide histopathology images with spatial transcriptomics, learning a joint representation that connects tissue morphology to the underlying spatial molecular programs. Where conventional computational pathology models read only the visual content of a hematoxylin-and-eosin slide, SQUALL is pretrained to associate each region of tissue with its measured gene expression, allowing it to infer molecular state directly from histology and to reason about morphology and transcriptomics together.
The model was developed by Zongxu Zhang, Zexian Zeng, and collaborators at Peking University's Center for Quantitative Biology, with co-authors from the Peking-Tsinghua Center for Life Sciences, the National Cancer Center / Cancer Hospital of the Chinese Academy of Medical Sciences, and Tsinghua University, and released as a bioRxiv preprint in June 2026. It sits within a fast-moving wave of histology-plus-transcriptomics foundation models, alongside efforts such as STORM (Stanford) and SpatialFusion (MIT), but is distinguished by the scale and breadth of its paired pretraining corpus and by its emphasis on zero-shot generalization across tissues and platforms without per-dataset retraining.
By pretraining on paired data rather than images alone, SQUALL is positioned as a general-purpose backbone for both discovery (mapping where molecular programs are active in tissue) and clinical prediction (relating tissue appearance to patient outcomes) from routinely available slides.
SQUALL is a transformer-based multimodal foundation model pretrained with a self-supervised objective that aligns whole-slide image regions with their paired spatial transcriptomic measurements. Its pretraining corpus, histMol, aggregates roughly 1.76 billion paired histology-ST spots and bins drawn from 3,446 tissue sections, covering 33 tissue types and 12 distinct spatial transcriptomics platforms — a scale and platform diversity intended to make the learned representation robust across assay chemistries and tissue contexts. After pretraining, the model supports transcriptome-wide virtual biomarker profiling, spatial niche discovery, and whole-slide outcome prediction. For clinical evaluation, the authors report outcome prediction on a cohort of 898 patients, and they benchmark SQUALL against existing computational pathology foundation models, reporting improved performance on these spatial and clinical tasks.
SQUALL is aimed at researchers and translational scientists working with digital pathology and spatial omics. From a standard whole-slide image it can predict spatially resolved gene expression, enabling "virtual" molecular biomarker profiling without running an expensive spatial assay on every sample; it can delineate spatial niches to study tissue architecture; and it can model disease progression, such as breast-cancer invasion trajectories. At the whole-slide level it supports patient outcome prediction, making it relevant to biomarker discovery, tumor microenvironment characterization, and prognostic modeling in oncology research.
By coupling 1.76 billion paired histology-transcriptomics observations into a single pretrained backbone, SQUALL pushes computational pathology beyond image-only representations toward models that natively reason about spatial molecular programs. Its reported gains over existing pathology foundation models on virtual biomarker profiling, niche discovery, and outcome prediction suggest paired pretraining at scale is a productive direction for the field. As a June 2026 preprint, its long-term influence remains to be established, and adoption is currently constrained: at the time of writing no public code, model weights, or HuggingFace release had been located, so independent reproduction and benchmarking are not yet possible. The work is released under a CC BY-NC license.
Zhang, Z., et al. (2026) Integrating Histology with Spatial Molecular Programs Using a Multimodal Foundation Model. bioRxiv.
DOI: 10.64898/2026.06.01.729028Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data