bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

TUMSyn

ShanghaiTech University / Hainan University / United Imaging Intelligence

Text-guided universal MRI synthesis generalist that generates customized brain MR sequences and resolutions from routine scans using imaging-metadata text prompts.

Released: September 2024
Parameters: 114 Million

TUMSyn (Text-guided Universal MR image Synthesis) is a generalist model for cross-sequence brain MRI synthesis. Clinical MRI protocols routinely acquire only a subset of the modalities that downstream diagnosis or analysis might require, and re-scanning patients to obtain missing contrasts (for example, FLAIR, T2-weighted, or a higher spatial resolution) is costly, slow, and sometimes infeasible. TUMSyn addresses this by generating MR images with on-demand imaging characteristics from already-acquired scans, steered by natural-language descriptions of the desired output.

The model was developed by researchers at ShanghaiTech University, Hainan University, and United Imaging Intelligence, led by Dinggang Shen's medical imaging AI group, with an initial preprint in September 2024 and the peer-reviewed version published in Cell Reports Medicine in 2025. Its central novelty is using imaging metadata (modality, repetition time, echo time, voxel spacing, magnetic field strength, and similar acquisition parameters) as a text prompt, allowing a single unified model to flexibly target a continuous space of contrasts and resolutions rather than being locked to a fixed set of source-to-target sequence pairs.

By framing synthesis as text-conditioned generation across 7 structural MRI modalities and 13 acquisition centers, TUMSyn positions itself as a foundation-style tool for medical image translation, applicable in both supervised settings and zero-shot scenarios where the requested acquisition parameters were never seen during training.

#Key Features

  • Text-prompted contrast control: Imaging metadata is encoded as a text prompt, so users specify the desired modality and acquisition parameters in natural language rather than selecting from a fixed menu of synthesis tasks.
  • Unified cross-sequence model: A single network performs a wide array of source-to-target translations (e.g., T1w→FLAIR, FLAIR→T1w), replacing the collection of task-specific GANs typically needed.
  • Arbitrary-resolution output: A Local Implicit Image Function (LIIF) decoder enables continuous, arbitrary-scale upsampling, decoupling output resolution from the input grid.
  • Zero-shot generalization: The model synthesizes images for imaging parameters and contrasts outside its training distribution, supporting unseen centers and protocols.
  • Broad demographic coverage: Training data spans subjects from roughly 2 to over 100 years of age across 13 centers, improving robustness across populations.

#Technical Details

TUMSyn uses a two-stage design totaling roughly 114 million parameters. Stage one pre-trains an MRI-specific text encoder via CLIP-style contrastive learning: a ViT-B/16-based image encoder (adapted for MR images) and a Transformer text encoder with byte-pair-encoding tokenization are aligned so that imaging-metadata prompts map into a shared embedding space with image features. In stage two, the frozen text encoder produces prompt features that condition image synthesis. A 24-layer ResNet-style CNN encoder (without downsampling) extracts image features, a multi-head cross-attention module fuses text and image representations, and a LIIF decoder reconstructs the target image at arbitrary resolution. Training used a brain MR database of 31,407 3D images covering 7 structural modalities from 13 centers. On held-out benchmarks the model reports strong fidelity, including 28.79 dB PSNR / 0.967 SSIM on T1w→FLAIR (ABCD) and 26.85 dB PSNR / 0.960 SSIM on FLAIR→T1w (ADNI-2), outperforming baselines such as SC-GAN by up to 2.86 dB PSNR.

#Applications

TUMSyn supports clinical and research imaging workflows where missing or low-resolution MR sequences would otherwise limit analysis. It can impute absent contrasts to complete multimodal protocols, super-resolve low-resolution acquisitions, and harmonize images across scanners and centers. The authors demonstrate clinical utility for brain disease screening in both supervised and zero-shot settings, and synthesized sequences can feed downstream segmentation, registration, and diagnostic pipelines, benefiting radiologists, neuroimaging researchers, and developers of medical image analysis tools.

#Impact

By turning cross-sequence MRI synthesis into a single text-controllable model, TUMSyn moves medical image translation toward the prompt-driven, generalist paradigm increasingly common in foundation models. Pre-trained weights are publicly released via Zenodo alongside code, lowering the barrier for groups to apply or extend the approach. The work illustrates how acquisition metadata can serve as a flexible conditioning signal for imaging generative models, and its peer-reviewed publication in Cell Reports Medicine signals validation of synthetic-MRI utility for diagnostic tasks. As with all synthetic medical imaging, generated sequences carry the risk of hallucinated or smoothed pathology and require careful validation before any clinical use.

Citation

Toward general text-guided multimodal brain MRI synthesis for diagnosis and medical image analysis

Wang, Y., et al. (2025) Toward general text-guided multimodal brain MRI synthesis for diagnosis and medical image analysis. Cell Reports Medicine.

DOI: 10.1016/j.xcrm.2025.102182

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations22
Influential0
References54

GitHub

Stars43
Forks4
Open Issues8
Contributors3
Last Push7mo ago
LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
28Closed
Usability — can I run it?41
Reproducibility — can I retrain it?9
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cnncontrastive_learningfoundation_modelimage_super_resolutionimage_synthesisimplicit_neural_representationmrimultimodalneuroimagingvision_transformerzero_shot

Resources

GitHub RepositoryResearch PaperDataset