MuPD

Diffusion-transformer pathology model embedding H&E histology, RNA profiles, and clinical text in a latent space for zero-shot cross-modal synthesis.

Released: April 2026

MuPD (Multimodal Pathology Diffusion) is a generative foundation model for computational pathology that embeds hematoxylin and eosin (H&E)-stained histology, molecular RNA profiles, and clinical text into a shared latent space through a diffusion transformer. Rather than treating each modality in isolation, MuPD learns the joint distribution across them, enabling generation of one modality conditioned on any combination of the others — including cases where measurements are missing or expensive to acquire.

The model addresses a persistent obstacle in multimodal medical data: real-world pathology datasets are frequently incomplete, with morphology, transcriptomics, and annotations rarely all available for the same sample. By unifying these modalities in a single generative framework, MuPD can synthesize realistic histology from text prompts or RNA profiles, perform virtual staining, and augment scarce datasets with biologically plausible samples. This positions it alongside generative pathology models such as STMDiT while extending conditioning to text and molecular signals at foundation-model scale.

MuPD was developed by the Ruijiang Li lab at Stanford University, with first author Jinxi Xiang, and released as an arXiv preprint in April 2026. It is a companion to STORM, a representation-learning foundation model from the same group, with MuPD focusing on the generative side of the spatial-transcriptomics-and-histology problem.

Key Features

Shared multimodal latent space: Embeds H&E histology, RNA expression, and clinical text into a common representation, allowing any modality to condition the generation of another.
Diffusion-transformer backbone: Uses a diffusion transformer to model the joint distribution across modalities, supporting high-fidelity image synthesis at scale.
Zero-shot cross-modal synthesis: Generates histology from text or RNA profiles, and translates between modalities, without task-specific re-training.
Virtual staining: Acts as a virtual stainer, with reported 37% improvement in marker correlation relative to specialized baselines.
Synthetic data augmentation: Boosts few-shot classification by 47% through generation of plausible synthetic training samples, addressing data scarcity in rare conditions.

Technical Details

MuPD is a diffusion transformer pretrained on a large multimodal corpus spanning 34 human organs: approximately 100 million H&E histology image patches, 1.6 million text-histology pairs, and 10.8 million RNA-histology pairs. The three modalities are projected into a shared latent space, over which the diffusion process is learned, enabling flexible conditional generation. On generation benchmarks, the authors report a 50% reduction in Fréchet Inception Distance (FID) for text-conditioned and image-to-image generation versus specialized single-task models, and a 23% FID reduction for RNA-conditioned histology generation. Downstream, synthetic augmentation improves few-shot classification by 47%, and virtual staining improves marker correlation by 37%.

Applications

MuPD supports computational pathology and spatial biology workflows where multimodal data is incomplete. Researchers can generate synthetic paired histology-transcriptomics-text data for augmentation, perform virtual staining to predict molecular markers from morphology, and prototype models on rare disease cohorts where labeled examples are scarce. The text-conditioning pathway allows histology synthesis directly from descriptive prompts, useful for exploratory studies and benchmark construction, while RNA-conditioned generation links transcriptional state to tissue appearance.

Impact

MuPD demonstrates that a single diffusion-transformer foundation model can unify histology, transcriptomics, and clinical text for generative tasks across dozens of organs, extending generative pathology beyond unconditional or label-conditioned image synthesis toward fully multimodal, cross-modal generation. Reported improvements in FID, few-shot classification, and virtual staining suggest practical value for data augmentation and modality imputation in settings where complete multimodal measurements are unavailable. As a recently released arXiv preprint, its claims await peer review and independent validation, and code and model weights were not yet public at release, but it points toward generative tools that fill gaps in real-world pathology datasets.

Citation

A Generative Foundation Model for Multimodal Histopathology

Preprint

Xiang, J., et al. (2026) A Generative Foundation Model for Multimodal Histopathology.

DOI: 10.48550/arXiv.2604.03635

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations279

Influential21

References75

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

15Closed

Usability — can I run it?22

Reproducibility — can I retrain it?3

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper Demo

Key Features

Shared multimodal latent space: Embeds H&E histology, RNA expression, and clinical text into a common representation, allowing any modality to condition the generation of another.

Diffusion-transformer backbone: Uses a diffusion transformer to model the joint distribution across modalities, supporting high-fidelity image synthesis at scale.

Zero-shot cross-modal synthesis: Generates histology from text or RNA profiles, and translates between modalities, without task-specific re-training.

Virtual staining: Acts as a virtual stainer, with reported 37% improvement in marker correlation relative to specialized baselines.

Synthetic data augmentation: Boosts few-shot classification by 47% through generation of plausible synthetic training samples, addressing data scarcity in rare conditions.

Technical Details

Applications

Impact

MuPD

Key Features

Technical Details

Applications

Impact

Citation

A Generative Foundation Model for Multimodal Histopathology

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

MuPD

Key Features

Technical Details

Applications

Impact

Citation

A Generative Foundation Model for Multimodal Histopathology

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

MuPD

#Key Features

#Technical Details

#Applications

#Impact

Citation

A Generative Foundation Model for Multimodal Histopathology

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

MuPD

#Key Features

#Technical Details

#Applications

#Impact

Citation

A Generative Foundation Model for Multimodal Histopathology

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact