MINIM

Peking University / Macau University of Science and Technology / Sun Yat-sen University

Text-to-image diffusion model that generates synthetic medical images across imaging modalities and organs to augment scarce clinical training data.

Released: February 2025

MINIM (Medical Image-text geNeratIve Model) is a generative foundation model that synthesizes realistic medical images of multiple organs across several imaging modalities directly from free-text instructions. Rather than predicting structure or diagnosis from an existing scan, MINIM tackles the inverse problem: producing high-fidelity synthetic images on demand to expand scarce, privacy-constrained, or imbalanced medical imaging datasets. It was developed by Jinzhuo Wang and colleagues at Peking University, Macau University of Science and Technology, Sun Yat-sen University, and collaborating institutions, and published in Nature Medicine in February 2025.

Medical AI development is chronically bottlenecked by limited access to large, well-annotated, and demographically diverse image corpora. MINIM addresses this by acting as a single text-conditioned generator spanning optical coherence tomography (OCT), fundus photography, chest X-ray, chest CT, and brain MRI, with breast MRI added through transfer learning. A clinician can prompt it with a textual description of the desired anatomy and finding, and the model returns a corresponding synthetic image.

A defining feature is its self-improving training loop: after initial diffusion pretraining, the model is refined with reinforcement learning from radiologist feedback, progressively raising the realism and clinical plausibility of its outputs. The authors report that following this fine-tuning, 91% of MINIM-generated OCT images received the highest quality rating from clinicians.

Key Features

Text-to-image generation across modalities: A single model synthesizes OCT, fundus, chest X-ray, chest CT, and brain MRI images conditioned on natural-language prompts describing organ and pathology.
Self-improving via reinforcement learning: Radiologist preference feedback is used to fine-tune the generator, measurably increasing the rated quality and realism of synthetic images over the base diffusion model.
Dataset augmentation that boosts downstream tasks: Adding synthetic images improves diagnostics, report generation, and self-supervised pretraining, with average gains of 12% (ophthalmic), 15% (chest), 13% (brain), and 17% (breast) reported across tasks.
Generalization to unseen domains: MINIM extends to previously unseen data domains (e.g., breast MRI) through transfer learning, indicating generalist medical-AI behavior beyond its original training modalities.
Clinically meaningful applications: Synthetic augmentation supports prediction of HER2-positive breast cancer from MRI and identification of targeted-therapy-sensitive EGFR mutations from lung cancer CT.

Technical Details

MINIM is a latent text-to-image diffusion model built on a Stable Diffusion-style framework, using a U-Net denoiser with cross-attention to condition image generation on text. Modality labels and textual descriptions are concatenated and encoded with a BERT tokenizer to form the conditioning signal, and images are produced by iteratively reversing a learned Gaussian noising process. The training corpus pairs medical images with textual descriptions spanning the supported modalities and organs. After supervised diffusion training, a two-stage reinforcement-learning procedure incorporates radiologist feedback to align generations with expert judgments of clinical quality.

Image quality and utility were evaluated with both objective metrics — Fréchet Inception Distance (FID), Inception Score (IS), multi-scale structural similarity (MS-SSIM), classification accuracy score, and image-image / image-text retrieval — and blinded clinician review. On downstream classification, augmenting real data with MINIM-generated images raised EGFR-mutation prediction accuracy from lung CT from 81.5% to 95.4% (at a 5:1 synthetic-to-real ratio) and HER2-status prediction from breast MRI from 79.2% to 94.0%.

Applications

MINIM is intended for medical-AI researchers and clinical informaticians who need to enlarge or rebalance training datasets without collecting and de-identifying additional patient scans. Synthetic images can augment diagnostic classifiers, seed self-supervised pretraining, and support automated radiology report generation. The reported HER2 and EGFR use cases illustrate how synthetic augmentation can sharpen biomarker and mutation prediction from routine imaging, which is relevant to precision-oncology workflows where labeled cases are scarce. The released code allows researchers to reproduce results and adapt the generator to new modalities via transfer learning.

Impact

MINIM demonstrates that a single text-conditioned generative model, refined with expert reinforcement signals, can produce synthetic medical images useful enough to materially improve a range of downstream clinical tasks. By framing data scarcity as a generation problem and showing consistent double-digit performance gains across organs and modalities, it strengthens the case for synthetic data as a practical lever in medical AI. Limitations remain: generated images can encode artifacts or biases from the training distribution, synthetic augmentation must be validated against real held-out data before clinical use, and the public release distributes weights via a third-party file host rather than a versioned model hub, with no formal model card or datasheet accompanying the code. As with all generative medical imaging, outputs require expert oversight and the model is not intended for clinical decision-making without further validation.

Citation

Self-improving generative foundation model for synthetic medical image generation and clinical applications

Wang, J., et al. (2024) Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nature Medicine.

DOI: 10.1038/s41591-024-03359-y

Recent citations

Papers that recently cited this model.

Language-Guided Segmentation of Medical Images: A Review of Foundation Models
Saqib Qamar
Bioengineering · Jul 2026
0
A Systematic Review on Synthetic Medical Images Generation—Recent Trends and Future Opportunities
Yumna Waheed, Muhammad Nouman Noor, Imran Ashraf
Diagnostics · Jul 2026
0
Ethical and Governance Challenges of AI in Medical Imaging and Diagnostics: A Systematic Survey and Policy Framework Recommendations
D. Athukorala, K. Ahmed, Raza Nowrozy
Healthcare · Jul 2026
0Influential

Top citations

The most-cited papers that cite this model.

Artificial intelligence in cancer: applications, challenges, and future perspectives
Cillian H. Cheng, Su-sheng Shi
Molecular Cancer · Oct 2025
28
Artificial intelligence for medicine 2025: Navigating the endless frontier
Ji Dai, Huiyu Xu, Tao Chen, et al.
The Innovation Medicine · 2025
25
AI-enabled molecular phenotyping and prognostic predictions in lung cancer through multimodal clinical information integration
Yuxing Lu, Fei Liu, Yunfang Yu, et al.
Cell Reports Medicine · Jun 2025
15
Uncovering ethical biases in publicly available fetal ultrasound datasets
M. C. Fiorentino, Sara Moccia, M. D. Cosmo, et al.
npj Digital Medicine · Jun 2025
14
The potential of large language models to advance precision oncology
S. Liang, Jiangjiang Zhang, Xingting Liu, et al.
EBioMedicine · Apr 2025
13

Citations

Total Citations138

Influential8

References48

GitHub

Stars158

Forks9

Open Issues2

Contributors1

Last Push1y ago

LanguagePython

Fields of citing research

Computer Science88%
Medicine87%
Engineering31%
Biology7%
Materials Science2%
Chemistry1%
Mathematics1%
Environmental Science1%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

41Partial

Usability — can I run it?56

Reproducibility — can I retrain it?32

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Text-to-image generation across modalities: A single model synthesizes OCT, fundus, chest X-ray, chest CT, and brain MRI images conditioned on natural-language prompts describing organ and pathology.

Self-improving via reinforcement learning: Radiologist preference feedback is used to fine-tune the generator, measurably increasing the rated quality and realism of synthetic images over the base diffusion model.

Dataset augmentation that boosts downstream tasks: Adding synthetic images improves diagnostics, report generation, and self-supervised pretraining, with average gains of 12% (ophthalmic), 15% (chest), 13% (brain), and 17% (breast) reported across tasks.

Generalization to unseen domains: MINIM extends to previously unseen data domains (e.g., breast MRI) through transfer learning, indicating generalist medical-AI behavior beyond its original training modalities.

Clinically meaningful applications: Synthetic augmentation supports prediction of HER2-positive breast cancer from MRI and identification of targeted-therapy-sensitive EGFR mutations from lung cancer CT.

Technical Details

Applications

Impact

Citation

Self-improving generative foundation model for synthetic medical image generation and clinical applications

Wang, J., et al. (2024) Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nature Medicine.

DOI: 10.1038/s41591-024-03359-y

MINIM

#Key Features

#Technical Details

#Applications

#Impact

Citation

Self-improving generative foundation model for synthetic medical image generation and clinical applications

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MINIM

#Key Features

#Technical Details

#Applications

#Impact

Citation

Self-improving generative foundation model for synthetic medical image generation and clinical applications

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact