TUMSyn

ShanghaiTech University / Hainan University / United Imaging Intelligence

Text-guided MRI synthesis model that generates brain MR sequences and resolutions on demand from routine scans using imaging-metadata prompts.

Released: September 2024

Parameters: 114 Million

TUMSyn (Text-guided Universal MR image Synthesis) is a generalist model for cross-sequence brain MRI synthesis. Clinical MRI protocols routinely acquire only a subset of the modalities that downstream diagnosis or analysis might require, and re-scanning patients to obtain missing contrasts (for example, FLAIR, T2-weighted, or a higher spatial resolution) is costly, slow, and sometimes infeasible. TUMSyn addresses this by generating MR images with on-demand imaging characteristics from already-acquired scans, steered by natural-language descriptions of the desired output.

The model was developed by researchers at ShanghaiTech University, Hainan University, and United Imaging Intelligence, led by Dinggang Shen's medical imaging AI group, with an initial preprint in September 2024 and the peer-reviewed version published in Cell Reports Medicine in 2025. Its central novelty is using imaging metadata (modality, repetition time, echo time, voxel spacing, magnetic field strength, and similar acquisition parameters) as a text prompt, allowing a single unified model to flexibly target a continuous space of contrasts and resolutions rather than being locked to a fixed set of source-to-target sequence pairs.

By framing synthesis as text-conditioned generation across 7 structural MRI modalities and 13 acquisition centers, TUMSyn positions itself as a foundation-style tool for medical image translation, applicable in both supervised settings and zero-shot scenarios where the requested acquisition parameters were never seen during training.

Key Features

Text-prompted contrast control: Imaging metadata is encoded as a text prompt, so users specify the desired modality and acquisition parameters in natural language rather than selecting from a fixed menu of synthesis tasks.
Unified cross-sequence model: A single network performs a wide array of source-to-target translations (e.g., T1w→FLAIR, FLAIR→T1w), replacing the collection of task-specific GANs typically needed.
Arbitrary-resolution output: A Local Implicit Image Function (LIIF) decoder enables continuous, arbitrary-scale upsampling, decoupling output resolution from the input grid.
Zero-shot generalization: The model synthesizes images for imaging parameters and contrasts outside its training distribution, supporting unseen centers and protocols.
Broad demographic coverage: Training data spans subjects from roughly 2 to over 100 years of age across 13 centers, improving robustness across populations.

Technical Details

TUMSyn uses a two-stage design totaling roughly 114 million parameters. Stage one pre-trains an MRI-specific text encoder via CLIP-style contrastive learning: a ViT-B/16-based image encoder (adapted for MR images) and a Transformer text encoder with byte-pair-encoding tokenization are aligned so that imaging-metadata prompts map into a shared embedding space with image features. In stage two, the frozen text encoder produces prompt features that condition image synthesis. A 24-layer ResNet-style CNN encoder (without downsampling) extracts image features, a multi-head cross-attention module fuses text and image representations, and a LIIF decoder reconstructs the target image at arbitrary resolution. Training used a brain MR database of 31,407 3D images covering 7 structural modalities from 13 centers. On held-out benchmarks the model reports strong fidelity, including 28.79 dB PSNR / 0.967 SSIM on T1w→FLAIR (ABCD) and 26.85 dB PSNR / 0.960 SSIM on FLAIR→T1w (ADNI-2), outperforming baselines such as SC-GAN by up to 2.86 dB PSNR.

Applications

TUMSyn supports clinical and research imaging workflows where missing or low-resolution MR sequences would otherwise limit analysis. It can impute absent contrasts to complete multimodal protocols, super-resolve low-resolution acquisitions, and harmonize images across scanners and centers. The authors demonstrate clinical utility for brain disease screening in both supervised and zero-shot settings, and synthesized sequences can feed downstream segmentation, registration, and diagnostic pipelines, benefiting radiologists, neuroimaging researchers, and developers of medical image analysis tools.

Impact

By turning cross-sequence MRI synthesis into a single text-controllable model, TUMSyn moves medical image translation toward the prompt-driven, generalist paradigm increasingly common in foundation models. Pre-trained weights are publicly released via Zenodo alongside code, lowering the barrier for groups to apply or extend the approach. The work illustrates how acquisition metadata can serve as a flexible conditioning signal for imaging generative models, and its peer-reviewed publication in Cell Reports Medicine signals validation of synthetic-MRI utility for diagnostic tasks. As with all synthetic medical imaging, generated sequences carry the risk of hallucinated or smoothed pathology and require careful validation before any clinical use.

Citation

Toward general text-guided multimodal brain MRI synthesis for diagnosis and medical image analysis

Wang, Y., et al. (2025) Toward general text-guided multimodal brain MRI synthesis for diagnosis and medical image analysis. Cell Reports Medicine.

DOI: 10.1016/j.xcrm.2025.102182

Recent citations

Papers that recently cited this model.

Toward brain magnetic resonance imaging analysis intelligence: A review of federated learning and visual foundation models
Zhen Yu, Yang Liu, Qingchao Chen
Engineering applications of artificial intelligence · Aug 2026
0
Metadata Supervised MRI Representations for Modelling and Controlling Acquisition Variability
Mehmet Yigit Avci, Pedro Borges, Virginia Fernandez, et al.
Jul 2026
0Influential
ResViTM-Net: Where local features meet global context, guided by patient priors for medical vision
Shaohong He, Siqi Liao, Yuyan Wu, et al.
Biomedical Signal Processing and Control · Jul 2026
0

Top citations

The most-cited papers that cite this model.

Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation
Xuanru Zhou, Cheng Li, Shuqiang Wang, et al.
Research · Jan 2025
13
A Review on the Applications of GANs for 3D Medical Image Analysis
Zoha Usama, A. Alavi, Jeffrey Chan
Applied Sciences · Oct 2025
6
Prompt mechanisms in medical imaging: A comprehensive survey
Hao Yang, Xinlong Liang, Zhang Li, et al.
Innovation (Cambridge (Mass.)) · Jun 2025
5
Applications, image analysis, and interpretation of computer vision in medical imaging
Yasunari Matsuzaka, Masayuki Iyoda
Frontiers in Radiology · Jan 2026
4
UniCAS: A foundation model for cervical cytology screening
Haotian Jiang, Jiangdong Cai, Zhenrong Shen, et al.
Cell Reports Medicine · Jan 2026
3

Citations

Total Citations26

Influential2

References54

GitHub

Stars45

Forks4

Open Issues8

Contributors3

Last Push8mo ago

LanguagePython

Fields of citing research

Computer Science100%
Medicine100%
Engineering54%
Physics8%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

28Closed

Usability — can I run it?41

Reproducibility — can I retrain it?9

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Dataset

Key Features

Text-prompted contrast control: Imaging metadata is encoded as a text prompt, so users specify the desired modality and acquisition parameters in natural language rather than selecting from a fixed menu of synthesis tasks.

Unified cross-sequence model: A single network performs a wide array of source-to-target translations (e.g., T1w→FLAIR, FLAIR→T1w), replacing the collection of task-specific GANs typically needed.

Arbitrary-resolution output: A Local Implicit Image Function (LIIF) decoder enables continuous, arbitrary-scale upsampling, decoupling output resolution from the input grid.

Zero-shot generalization: The model synthesizes images for imaging parameters and contrasts outside its training distribution, supporting unseen centers and protocols.

Broad demographic coverage: Training data spans subjects from roughly 2 to over 100 years of age across 13 centers, improving robustness across populations.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Toward brain magnetic resonance imaging analysis intelligence: A review of federated learning and visual foundation models

Zhen Yu, Yang Liu, Qingchao Chen

Engineering applications of artificial intelligence · Aug 2026

Metadata Supervised MRI Representations for Modelling and Controlling Acquisition Variability

Mehmet Yigit Avci, Pedro Borges, Virginia Fernandez, et al.

Jul 2026

0Influential

ResViTM-Net: Where local features meet global context, guided by patient priors for medical vision

Shaohong He, Siqi Liao, Yuyan Wu, et al.

Biomedical Signal Processing and Control · Jul 2026

TUMSyn

#Key Features

#Technical Details

#Applications

#Impact

Citation

Toward general text-guided multimodal brain MRI synthesis for diagnosis and medical image analysis

Recent citations

Metadata Supervised MRI Representations for Modelling and Controlling Acquisition Variability

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

TUMSyn

#Key Features

#Technical Details

#Applications

#Impact

Citation

Toward general text-guided multimodal brain MRI synthesis for diagnosis and medical image analysis

Recent citations

Metadata Supervised MRI Representations for Modelling and Controlling Acquisition Variability

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact