West China Hospital of Sichuan University / NVIDIA
Self-supervised vision transformer autoencoder pretrained on ~57,000 multi-contrast brain MRIs via masked image modeling for downstream brain tumor diagnosis.
LaMIM (Large Medical Image foundation Model) is a self-supervised foundation model for multi-contrast brain MRI, developed by researchers at West China Hospital of Sichuan University (with a collaborator from NVIDIA) and published in European Radiology in 2024. It addresses a persistent bottleneck in medical imaging AI: high-quality labels are scarce and expensive, while unlabeled scans are abundant. By pretraining on a large pool of unlabeled head MRIs, LaMIM learns general-purpose volumetric representations that can be transferred to specific diagnostic tasks with comparatively little labeled data.
The work is framed as a pilot study demonstrating that the masked-image-modeling paradigm—which reshaped natural-image and language pretraining—translates to 3D whole-brain MRI. Rather than training a task-specific network from scratch, the authors pretrain a vision transformer autoencoder to reconstruct deliberately corrupted MRI volumes, then attach lightweight classifiers for downstream brain tumor applications. This positions LaMIM within the broader wave of medical imaging foundation models that seek to amortize the cost of annotation across many clinical tasks.
The model is notable for operating directly on multi-contrast 3D volumes (T1w, T1c, T2w, and FLAIR) and for releasing pretrained weights, making it a practical starting point for neuroimaging researchers building tumor-related classifiers.
LaMIM is built on a vision transformer autoencoder (ViTAutoEnc) that ingests multi-contrast 3D brain MRI volumes. Pretraining follows a masked-image-modeling objective: input volumes are corrupted with content-dropout, and the model is trained to restore the missing regions using cross-contrast context, forcing it to learn anatomical and tissue-contrast priors. The authors release two variants distinguished by masking granularity—coarse 16×16×16 blocks versus fine 4×4×4 blocks. For downstream evaluation, classifiers were attached to the pretrained encoder and fine-tuned on labeled tumor data. On independent test sets, the pretrained models reached 94.9% accuracy (AUC 0.981) for brain tumor detection, 92.3% accuracy (AUC 0.972) for tumor discrimination, and 80.4% accuracy (AUC 0.852) for molecular status prediction, consistently surpassing from-scratch convolutional baselines.
LaMIM targets neuro-oncology imaging workflows, where it can serve as a pretrained backbone for brain tumor detection, tumor-type discrimination, and molecular status prediction from multi-contrast MRI. By transferring features learned from tens of thousands of unlabeled scans, it lets radiology and neuroimaging researchers build accurate classifiers from modest labeled cohorts, lowering the annotation burden for new diagnostic tasks. The released checkpoints provide a practical initialization for groups developing volumetric MRI classifiers without access to large in-house labeled datasets.
As an early demonstration that self-supervised masked image modeling can be applied to large-scale 3D multi-contrast brain MRI, LaMIM contributes to the growing evidence that foundation-model pretraining improves label efficiency and performance in neuroimaging. Its consistent ~10% gains over from-scratch baselines underscore the value of unlabeled clinical archives for building diagnostic models. As a pilot study with openly released weights, LaMIM is best viewed as a proof of concept and reusable starting point rather than a clinically validated tool; downstream results were obtained on the authors' tumor cohorts, and broader external validation across sites and scanners remains future work.
Chen, M., et al. (2024) Medical image foundation models in assisting diagnosis of brain tumors: a pilot study. European Radiology.
DOI: 10.1007/s00330-024-10728-1Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data