bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
PathologySpatial omics

STMDiT

ETH Zurich / University Hospital Basel

A diffusion transformer that synthesizes H&E histopathology image patches conditioned jointly on spatial transcriptomics gene expression and morphological embeddings.

Released: May 2026

STMDiT (Spatial Transcriptomics and Morphology Diffusion Transformer) is a generative model that synthesizes realistic hematoxylin and eosin (H&E) histopathology image patches conditioned on the underlying molecular state of the tissue. Rather than treating histology and transcriptomics as separate readouts, STMDiT bridges them: given a spatial gene-expression profile and a morphological context, it generates the corresponding tissue appearance, framing virtual tissue synthesis as a conditional image-generation problem.

The model targets a persistent gap in computational pathology, where gene expression and tissue morphology are typically analyzed in isolation. By learning the mapping from molecular profiles to visual phenotype, STMDiT offers a route to interrogate how transcriptional programs manifest as observable tissue structure and to generate paired molecular–morphological data where experimental measurements are sparse or costly.

STMDiT was developed by researchers at ETH Zurich (Institute for Machine Learning) and University Hospital Basel — Pantelis R. Vlachas, Kalin Nonchev, Viktor H. Koelzer, and Gunnar Rätsch — and released as a bioRxiv preprint in May 2026, with the work presented at the ICML 2026 SD4H (Structured Data for Health) Workshop.

#Key Features

  • Transcriptomics-conditioned synthesis: Generates H&E patches conditioned on spatial gene-expression profiles, directly coupling molecular state to tissue morphology in a single generative model.
  • Dual conditioning via cross-attention: Combines adaptive layer normalization with per-block cross-attention to inject both gene-expression and morphological embeddings into the diffusion transformer backbone.
  • Frozen scRNA-seq encoder: Encodes expression profiles with a frozen CancerFoundation single-cell RNA-seq foundation model, leveraging pretrained molecular representations rather than learning them from scratch.
  • Dual classifier-free guidance: Applies independent classifier-free guidance to the transcriptomic and morphological conditioning signals, allowing the relative influence of each modality to be tuned at inference.
  • Zero-shot out-of-distribution transfer: Generalizes to an unseen cohort (TCGA SKCM) from H&E input alone, without any re-training on the new data.

#Technical Details

STMDiT is built on the PixCell diffusion transformer architecture for pathology image generation. The denoising network is conditioned through two pathways: adaptive layer-norm modulation and per-block cross-attention that attend to (1) gene-expression embeddings produced by a frozen CancerFoundation scRNA-seq encoder and (2) morphological embeddings capturing local tissue context. Dual classifier-free guidance scales the two conditioning streams independently during sampling. The model was trained on 10x Genomics Visium spatial transcriptomics paired with H&E imaging from the Tumor Profiler (TuPro) melanoma cohort. The authors release 30 EMA inference checkpoints. Out-of-distribution capability was demonstrated by zero-shot transfer to the TCGA SKCM melanoma cohort using H&E input only, with no re-training.

#Applications

STMDiT supports research at the interface of molecular and morphological pathology. It can be used to generate synthetic paired transcriptomics–histology data for augmentation, to visualize how specific gene-expression profiles correspond to tissue appearance, and to study the structure-function relationship in tumor tissue such as melanoma. For computational pathology and spatial biology groups, the demonstrated zero-shot transfer to an external H&E-only cohort suggests utility for analyzing archival histology where matched spatial transcriptomics is unavailable.

#Impact

STMDiT is an early demonstration that diffusion transformers can synthesize tissue morphology conditioned on spatial gene expression, extending generative pathology models from unconditional or label-conditioned image generation toward molecularly grounded virtual tissue synthesis. By reusing a frozen single-cell foundation model and a pathology diffusion backbone, it illustrates how foundation models across modalities can be composed for cross-modal generation. As a workshop-stage preprint released from an anonymized review checkpoint account, its claims await peer review and broader validation, but it points toward generative tools that connect molecular measurements with the morphological readouts pathologists rely on.

Citation

Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers

Vlachas, P. R., et al. (2026) Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers. bioRxiv.

DOI: 10.64898/2026.05.26.727902

Openness

Unclassified
Missing required components

Tags

diffusion_transformergenerativehistologyimage_synthesismultimodalspatial_transcriptomicsvirtual_tissue_synthesiszero_shot

Resources

Research PaperHuggingFace Model