Helmholtz Munich / Wellcome Sanger Institute
Hierarchical graph-based multi-modal contrastive framework that aligns H&E histopathology images with spatial transcriptomics profiles across multiple scales.
SIGMMA (Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment) is a representation-learning framework that jointly models Hematoxylin and Eosin (H&E) histopathology images and spatial transcriptomics (ST) profiles. The core problem it addresses is the gap between tissue morphology, which is cheap and ubiquitous to capture as a stained slide, and spatially resolved gene expression, which is information-rich but costly to measure. By learning a shared representation across these two modalities, SIGMMA aims to predict molecular readouts directly from histology and to retrieve corresponding morphological and transcriptomic regions across modalities.
The key conceptual advance is that SIGMMA aligns the two modalities hierarchically, across multiple spatial scales rather than at a single fixed resolution. Tissue biology is organized across scales — from individual cells to local microenvironments to larger tissue domains — and a single-scale alignment discards this structure. SIGMMA builds graph representations of the tissue at multiple scales and applies contrastive learning to align the image and transcriptomic views within and across those scales, capturing both fine-grained cellular detail and broader spatial context.
The model was introduced in November 2025 by Dabin Jeong, Mohammad Lotfollahi, and colleagues at Helmholtz Munich and the Wellcome Sanger Institute, extending the Lotfollahi lab's line of work on multimodal and generative models for single-cell and spatial biology into the histopathology–transcriptomics setting.
SIGMMA is a multi-modal contrastive framework that couples graph neural network representations of tissue with image and transcriptomic encoders, aligning the two modalities through contrastive objectives applied across a hierarchy of spatial scales. Each modality is encoded into representations at multiple resolutions, and the contrastive loss pulls together matched image–expression pairs while separating mismatched pairs, both within a scale and across scales to enforce consistency through the hierarchy. The reported benchmarks — an average 9.78% gain in gene-expression prediction and an average 26.93% gain in cross-modal retrieval — indicate that the multi-scale, graph-based formulation captures spatial structure that single-scale vision-language approaches to histology–transcriptomics alignment miss. Full architecture and training details are described in the arXiv preprint (arXiv:2511.15464).
SIGMMA targets computational pathology and spatial biology workflows where spatially resolved gene expression is desirable but expensive or unavailable. By predicting expression from routinely collected H&E slides, it can help extend molecular insight to large archives of stained tissue, and its cross-modal retrieval capability supports finding morphologically or transcriptomically similar tissue regions across datasets. Researchers studying tissue architecture, disease microenvironments, and the relationship between morphology and molecular state are the primary beneficiaries, particularly those integrating histology with spatial transcriptomics platforms.
SIGMMA contributes to the rapidly growing area of image–transcriptomics
alignment in computational pathology, where bridging cheap morphological data and
costly molecular measurements is a central goal. Its main contribution is
demonstrating that explicitly modeling tissue across multiple spatial scales with
graph representations yields measurable improvements over single-scale alignment
on both prediction and retrieval. As a recent preprint (November 2025), its
downstream adoption is not yet established. A notable limitation is availability:
at the time of review, no public code or trained weights were identified for
SIGMMA, which constrains independent reproduction and reuse. (An unrelated catalog
entry, flash-sigma-kg, should not be confused with this model.)
Jeong, D., et al. (2025) SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome. arXiv.org.
DOI: 10.48550/arXiv.2511.15464Papers that recently cited this model.
Shravan Venkatraman, Muthu Subash Kavitha, Joe Dhanith, et al.
arXiv.org · Dec 2025
The most-cited papers that cite this model.
Shravan Venkatraman, Muthu Subash Kavitha, Joe Dhanith, et al.
arXiv.org · Dec 2025
Share of papers citing this model.