STMDiT

Diffusion transformer for virtual tissue synthesis, generating H&E histopathology patches conditioned on spatial gene expression and morphology.

Released: May 2026

STMDiT (Spatial Transcriptomics and Morphology Diffusion Transformer) is a generative model that synthesizes realistic hematoxylin and eosin (H&E) histopathology image patches conditioned on the underlying molecular state of the tissue. Rather than treating histology and transcriptomics as separate readouts, STMDiT bridges them: given a spatial gene-expression profile and a morphological context, it generates the corresponding tissue appearance, framing virtual tissue synthesis as a conditional image-generation problem.

The model targets a persistent gap in computational pathology, where gene expression and tissue morphology are typically analyzed in isolation. By learning the mapping from molecular profiles to visual phenotype, STMDiT offers a route to interrogate how transcriptional programs manifest as observable tissue structure and to generate paired molecular–morphological data where experimental measurements are sparse or costly.

STMDiT was developed by researchers at ETH Zurich (Institute for Machine Learning) and University Hospital Basel — Pantelis R. Vlachas, Kalin Nonchev, Viktor H. Koelzer, and Gunnar Rätsch — and released as a bioRxiv preprint in May 2026, with the work presented at the ICML 2026 SD4H (Structured Data for Health) Workshop.

Key Features

Transcriptomics-conditioned synthesis: Generates H&E patches conditioned on spatial gene-expression profiles, directly coupling molecular state to tissue morphology in a single generative model.
Dual conditioning via cross-attention: Combines adaptive layer normalization with per-block cross-attention to inject both gene-expression and morphological embeddings into the diffusion transformer backbone.
Frozen scRNA-seq encoder: Encodes expression profiles with a frozen CancerFoundation single-cell RNA-seq foundation model, leveraging pretrained molecular representations rather than learning them from scratch.
Dual classifier-free guidance: Applies independent classifier-free guidance to the transcriptomic and morphological conditioning signals, allowing the relative influence of each modality to be tuned at inference.
Zero-shot out-of-distribution transfer: Generalizes to an unseen cohort (TCGA SKCM) from H&E input alone, without any re-training on the new data.

Technical Details

STMDiT is built on the PixCell diffusion transformer architecture for pathology image generation. The denoising network is conditioned through two pathways: adaptive layer-norm modulation and per-block cross-attention that attend to (1) gene-expression embeddings produced by a frozen CancerFoundation scRNA-seq encoder and (2) morphological embeddings capturing local tissue context. Dual classifier-free guidance scales the two conditioning streams independently during sampling. The model was trained on 10x Genomics Visium spatial transcriptomics paired with H&E imaging from the Tumor Profiler (TuPro) melanoma cohort. The authors release 30 EMA inference checkpoints. Out-of-distribution capability was demonstrated by zero-shot transfer to the TCGA SKCM melanoma cohort using H&E input only, with no re-training.

Applications

STMDiT supports research at the interface of molecular and morphological pathology. It can be used to generate synthetic paired transcriptomics–histology data for augmentation, to visualize how specific gene-expression profiles correspond to tissue appearance, and to study the structure-function relationship in tumor tissue such as melanoma. For computational pathology and spatial biology groups, the demonstrated zero-shot transfer to an external H&E-only cohort suggests utility for analyzing archival histology where matched spatial transcriptomics is unavailable.

Impact

STMDiT is an early demonstration that diffusion transformers can synthesize tissue morphology conditioned on spatial gene expression, extending generative pathology models from unconditional or label-conditioned image generation toward molecularly grounded virtual tissue synthesis. By reusing a frozen single-cell foundation model and a pathology diffusion backbone, it illustrates how foundation models across modalities can be composed for cross-modal generation. As a workshop-stage preprint released from an anonymized review checkpoint account, its claims await peer review and broader validation, but it points toward generative tools that connect molecular measurements with the morphological readouts pathologists rely on.

Citation

Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers

Vlachas, P. R., et al. (2026) Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers. bioRxiv.

DOI: 10.64898/2026.05.26.727902

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations30

Influential4

References88

HuggingFace

Downloads0

Likes0

Last Modified2mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

44Partial

Usability — can I run it?60

Reproducibility — can I retrain it?12

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper HuggingFace Model

Key Features

Transcriptomics-conditioned synthesis: Generates H&E patches conditioned on spatial gene-expression profiles, directly coupling molecular state to tissue morphology in a single generative model.

Dual conditioning via cross-attention: Combines adaptive layer normalization with per-block cross-attention to inject both gene-expression and morphological embeddings into the diffusion transformer backbone.

Frozen scRNA-seq encoder: Encodes expression profiles with a frozen CancerFoundation single-cell RNA-seq foundation model, leveraging pretrained molecular representations rather than learning them from scratch.

Dual classifier-free guidance: Applies independent classifier-free guidance to the transcriptomic and morphological conditioning signals, allowing the relative influence of each modality to be tuned at inference.

Zero-shot out-of-distribution transfer: Generalizes to an unseen cohort (TCGA SKCM) from H&E input alone, without any re-training on the new data.

Technical Details

Applications

Impact

STMDiT

Key Features

Technical Details

Applications

Impact

Citation

Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers

Recent citations

Top citations

Citations

HuggingFace

Fields of citing research

Openness

Tags

Resources

STMDiT

Key Features

Technical Details

Applications

Impact

Citation

Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers

Recent citations

Top citations

Citations

HuggingFace

Fields of citing research

Openness

Tags

Resources

STMDiT

#Key Features

#Technical Details

#Applications

#Impact

Citation

Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers

Recent citations

Top citations

Related models

Citations

HuggingFace

Fields of citing research

Openness

Tags

Resources

STMDiT

#Key Features

#Technical Details

#Applications

#Impact

Citation

Transcriptomics-Conditioned Virtual Tissue Synthesis via Diffusion Transformers

Recent citations

Top citations

Related models

Citations

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact