University of Florida / NVIDIA
A 3D vision-transformer foundation model for multimodal neuroimage segmentation, pretrained self-supervised on brain MRI from 41,400 participants.
BrainSegFounder is one of the first 3D foundation models built specifically for neuroimage segmentation, addressing a long-standing bottleneck in medical imaging: pixel-level annotation of brain MRI is expensive, requires expert radiologists, and yields datasets far too small to train large 3D models from scratch. Rather than treating each segmentation task in isolation, BrainSegFounder learns transferable anatomical representations from a large corpus of unlabeled, generally healthy brains and then adapts them to downstream tasks such as tumor and stroke-lesion delineation.
Developed by Ruogu Fang's SMILE lab at the University of Florida (with a collaborator at NVIDIA) and published in Medical Image Analysis in 2024, the model introduces a two-stage self-supervised pretraining recipe. The first stage encodes normal brain anatomy from multimodal MRI of 41,400 participants drawn from the UK Biobank; the second stage refines these representations toward disease-specific cues — the geometry and spatial placement of tumors and lesions — before any task-specific fine-tuning.
The work demonstrates that the foundation-model paradigm that transformed natural-language and protein modeling can be extended to volumetric medical imaging, where 3D context and multiple MRI contrasts (T1, T1ce, T2, FLAIR) are essential. It provides a reusable pretrained backbone that downstream groups can fine-tune on small labeled clinical datasets.
BrainSegFounder uses a Swin UNETR encoder-decoder, with the Small variant (~64M parameters) performing best across experiments. Self-supervised pretraining draws on multimodal structural MRI from 41,400 UK Biobank participants, learning anatomical structure before disease-aware refinement. On the BraTS brain tumor benchmark under 5-fold cross-validation, BrainSegFounder-Small reached a mean Dice coefficient of 0.9115, surpassing a from-scratch Swin UNETR baseline at 0.8971. On the ATLAS v2.0 stroke-lesion dataset it achieved a Dice score of 0.712 and a lesion-wise F1 of 0.711, placing within the top three of the challenge leaderboard. These gains, especially in the low-label setting, illustrate the value of large-scale anatomical pretraining as a starting point for clinical segmentation tasks.
BrainSegFounder is aimed at researchers and clinical-imaging groups building automated segmentation pipelines for brain pathology — quantifying tumor volumes for neuro-oncology, delineating stroke lesions for outcome studies, and supporting longitudinal monitoring in neurodegenerative research. Because the pretrained backbone already encodes healthy-brain anatomy, labs with only modest annotated datasets can fine-tune it for their specific MRI protocol or disease of interest, reducing the annotation burden that typically limits deep-learning adoption in neuroimaging.
By showing that a single self-supervised backbone pretrained on tens of thousands of brains transfers across distinct segmentation tasks, BrainSegFounder helped establish the foundation-model approach for 3D neuroimaging and offered a concrete, reproducible recipe for it. Its release of code and weights lowers the barrier for clinical groups to leverage large-scale pretraining, and its benchmark results on BraTS and ATLAS provide a reference point for subsequent 3D medical foundation models. Limitations include reliance on the demographically narrow, generally healthy UK Biobank cohort for pretraining and a focus on structural MRI, leaving generalization to other scanners, populations, and modalities as open questions.
Cox, J., et al. (2024) BrainSegFounder: Towards 3D foundation models for neuroimage segmentation. Medical Image Anal..
DOI: 10.1016/j.media.2024.103301Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data