bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

MINIM

Peking University / Macau University of Science and Technology / Sun Yat-sen University

A self-improving text-to-image diffusion foundation model that generates synthetic medical images across multiple modalities and organs to augment downstream clinical AI tasks.

Released: February 2025

MINIM (Medical Image-text geNeratIve Model) is a generative foundation model that synthesizes realistic medical images of multiple organs across several imaging modalities directly from free-text instructions. Rather than predicting structure or diagnosis from an existing scan, MINIM tackles the inverse problem: producing high-fidelity synthetic images on demand to expand scarce, privacy-constrained, or imbalanced medical imaging datasets. It was developed by Jinzhuo Wang and colleagues at Peking University, Macau University of Science and Technology, Sun Yat-sen University, and collaborating institutions, and published in Nature Medicine in February 2025.

Medical AI development is chronically bottlenecked by limited access to large, well-annotated, and demographically diverse image corpora. MINIM addresses this by acting as a single text-conditioned generator spanning optical coherence tomography (OCT), fundus photography, chest X-ray, chest CT, and brain MRI, with breast MRI added through transfer learning. A clinician can prompt it with a textual description of the desired anatomy and finding, and the model returns a corresponding synthetic image.

A defining feature is its self-improving training loop: after initial diffusion pretraining, the model is refined with reinforcement learning from radiologist feedback, progressively raising the realism and clinical plausibility of its outputs. The authors report that following this fine-tuning, 91% of MINIM-generated OCT images received the highest quality rating from clinicians.

#Key Features

  • Text-to-image generation across modalities: A single model synthesizes OCT, fundus, chest X-ray, chest CT, and brain MRI images conditioned on natural-language prompts describing organ and pathology.
  • Self-improving via reinforcement learning: Radiologist preference feedback is used to fine-tune the generator, measurably increasing the rated quality and realism of synthetic images over the base diffusion model.
  • Dataset augmentation that boosts downstream tasks: Adding synthetic images improves diagnostics, report generation, and self-supervised pretraining, with average gains of 12% (ophthalmic), 15% (chest), 13% (brain), and 17% (breast) reported across tasks.
  • Generalization to unseen domains: MINIM extends to previously unseen data domains (e.g., breast MRI) through transfer learning, indicating generalist medical-AI behavior beyond its original training modalities.
  • Clinically meaningful applications: Synthetic augmentation supports prediction of HER2-positive breast cancer from MRI and identification of targeted-therapy-sensitive EGFR mutations from lung cancer CT.

#Technical Details

MINIM is a latent text-to-image diffusion model built on a Stable Diffusion-style framework, using a U-Net denoiser with cross-attention to condition image generation on text. Modality labels and textual descriptions are concatenated and encoded with a BERT tokenizer to form the conditioning signal, and images are produced by iteratively reversing a learned Gaussian noising process. The training corpus pairs medical images with textual descriptions spanning the supported modalities and organs. After supervised diffusion training, a two-stage reinforcement-learning procedure incorporates radiologist feedback to align generations with expert judgments of clinical quality.

Image quality and utility were evaluated with both objective metrics — Fréchet Inception Distance (FID), Inception Score (IS), multi-scale structural similarity (MS-SSIM), classification accuracy score, and image-image / image-text retrieval — and blinded clinician review. On downstream classification, augmenting real data with MINIM-generated images raised EGFR-mutation prediction accuracy from lung CT from 81.5% to 95.4% (at a 5:1 synthetic-to-real ratio) and HER2-status prediction from breast MRI from 79.2% to 94.0%.

#Applications

MINIM is intended for medical-AI researchers and clinical informaticians who need to enlarge or rebalance training datasets without collecting and de-identifying additional patient scans. Synthetic images can augment diagnostic classifiers, seed self-supervised pretraining, and support automated radiology report generation. The reported HER2 and EGFR use cases illustrate how synthetic augmentation can sharpen biomarker and mutation prediction from routine imaging, which is relevant to precision-oncology workflows where labeled cases are scarce. The released code allows researchers to reproduce results and adapt the generator to new modalities via transfer learning.

#Impact

MINIM demonstrates that a single text-conditioned generative model, refined with expert reinforcement signals, can produce synthetic medical images useful enough to materially improve a range of downstream clinical tasks. By framing data scarcity as a generation problem and showing consistent double-digit performance gains across organs and modalities, it strengthens the case for synthetic data as a practical lever in medical AI. Limitations remain: generated images can encode artifacts or biases from the training distribution, synthetic augmentation must be validated against real held-out data before clinical use, and the public release distributes weights via a third-party file host rather than a versioned model hub, with no formal model card or datasheet accompanying the code. As with all generative medical imaging, outputs require expert oversight and the model is not intended for clinical decision-making without further validation.

Citation

Self-improving generative foundation model for synthetic medical image generation and clinical applications

Wang, J., et al. (2024) Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nature Medicine.

DOI: 10.1038/s41591-024-03359-y

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations127
Influential7
References48

GitHub

Stars158
Forks9
Open Issues2
Contributors1
Last Push1y ago
LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
41Partial
Usability — can I run it?56
Reproducibility — can I retrain it?32
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

data_augmentationdiffusionfoundation_modelgenerativeimage_generationmedical_imagingophthalmologyradiologyreinforcement_learningreport_generationtext_guidedu_net

Resources

GitHub RepositoryResearch Paper