ETH Zurich / University Hospital Basel
A diffusion transformer that synthesizes H&E histopathology image patches conditioned jointly on spatial transcriptomics gene expression and morphological embeddings.
STMDiT (Spatial Transcriptomics and Morphology Diffusion Transformer) is a generative model that synthesizes realistic hematoxylin and eosin (H&E) histopathology image patches conditioned on the underlying molecular state of the tissue. Rather than treating histology and transcriptomics as separate readouts, STMDiT bridges them: given a spatial gene-expression profile and a morphological context, it generates the corresponding tissue appearance, framing virtual tissue synthesis as a conditional image-generation problem.
The model targets a persistent gap in computational pathology, where gene expression and tissue morphology are typically analyzed in isolation. By learning the mapping from molecular profiles to visual phenotype, STMDiT offers a route to interrogate how transcriptional programs manifest as observable tissue structure and to generate paired molecular–morphological data where experimental measurements are sparse or costly.
STMDiT was developed by researchers at ETH Zurich (Institute for Machine Learning) and University Hospital Basel — Pantelis R. Vlachas, Kalin Nonchev, Viktor H. Koelzer, and Gunnar Rätsch — and released as a bioRxiv preprint in May 2026, with the work presented at the ICML 2026 SD4H (Structured Data for Health) Workshop.
STMDiT is built on the PixCell diffusion transformer architecture for pathology image generation. The denoising network is conditioned through two pathways: adaptive layer-norm modulation and per-block cross-attention that attend to (1) gene-expression embeddings produced by a frozen CancerFoundation scRNA-seq encoder and (2) morphological embeddings capturing local tissue context. Dual classifier-free guidance scales the two conditioning streams independently during sampling. The model was trained on 10x Genomics Visium spatial transcriptomics paired with H&E imaging from the Tumor Profiler (TuPro) melanoma cohort. The authors release 30 EMA inference checkpoints. Out-of-distribution capability was demonstrated by zero-shot transfer to the TCGA SKCM melanoma cohort using H&E input only, with no re-training.
STMDiT supports research at the interface of molecular and morphological pathology. It can be used to generate synthetic paired transcriptomics–histology data for augmentation, to visualize how specific gene-expression profiles correspond to tissue appearance, and to study the structure-function relationship in tumor tissue such as melanoma. For computational pathology and spatial biology groups, the demonstrated zero-shot transfer to an external H&E-only cohort suggests utility for analyzing archival histology where matched spatial transcriptomics is unavailable.
STMDiT is an early demonstration that diffusion transformers can synthesize tissue morphology conditioned on spatial gene expression, extending generative pathology models from unconditional or label-conditioned image generation toward molecularly grounded virtual tissue synthesis. By reusing a frozen single-cell foundation model and a pathology diffusion backbone, it illustrates how foundation models across modalities can be composed for cross-modal generation. As a workshop-stage preprint released from an anonymized review checkpoint account, its claims await peer review and broader validation, but it points toward generative tools that connect molecular measurements with the morphological readouts pathologists rely on.
Vlachas, P. R., et al. (2026) Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers. bioRxiv.
DOI: 10.64898/2026.05.26.727902