A 3D MRI organ segmentation foundation model based on Swin-UNETR, trained on the UKBOB dataset of 1.37 billion labeled masks across 72 anatomical structures.
Swin-BOB is a 3D medical image segmentation foundation model for whole-body magnetic resonance imaging (MRI), developed by the Visual Geometry Group at the University of Oxford and presented at ICCV 2025. It addresses a persistent bottleneck in medical imaging AI: the scarcity of large, densely labeled 3D training data. While natural-image models benefit from web-scale supervision, volumetric organ segmentation has traditionally relied on a few hundred hand-annotated scans, limiting both accuracy and generalization across scanners, protocols, and populations.
The model is trained on UKBOB (UK Biobank Organs and Bones), the largest labeled medical imaging dataset assembled to date. UKBOB contains 51,761 3D MRI samples (roughly 17.9 million 2D slices) and more than 1.37 billion 2D segmentation masks spanning 72 organs and skeletal structures, derived from the UK Biobank cohort. Because masks at this scale cannot be drawn by hand, the authors built an automated labeling pipeline with organ-aware filtering and a "Specialized Organ Labels Filtering" cleaning step, then validated quality against a manually annotated subset (UKBOB-manual, 300 MRIs with 11 abdominal classes).
Swin-BOB demonstrates that pretraining at this scale produces representations that transfer well beyond the source distribution. The released model serves both as a ready-to-use whole-body MRI organ segmenter and as a pretrained backbone that can be fine-tuned or adapted at test time to new anatomies, modalities, and clinical benchmarks.
Swin-BOB uses a Swin-UNETR architecture with a hierarchical encoder (attention heads of 3, 6, 12, and 24 across stages; 7×7×7 windows with a shift of 3; base feature width of 48 scaling to 768 at the bottleneck) and approximately 62 million trainable parameters. Training combines large-scale semi-supervised learning on filtered UKBOB labels with deep supervision. On held-out benchmarks the model reports competitive whole-body and cross-domain results, including roughly 85.3% Dice on the BTCV abdominal benchmark (13 organs) and 91.2% Dice on BraTS23 brain tumor segmentation (3 classes), with the UKBOB-pretrained model improving over strong baselines by about 1.3% on BTCV and 0.4% on the BraTS brain MRI challenge. Code and pretrained weights are released under an MIT license, and the filtered labels are made available through the UK Biobank.
Swin-BOB targets researchers and clinicians working with volumetric MRI who need robust, automated organ and tissue delineation. Direct uses include whole-body organ quantification for population-scale studies (e.g., body composition, organ volumetry, and biomarker extraction from biobank-scale cohorts), and as a pretrained backbone for downstream tasks such as tumor segmentation, abdominal CT adaptation, and disease-specific analyses. The test-time adaptation module makes it practical to deploy on data from new scanners or institutions without collecting large new labeled sets, lowering the barrier for groups that lack in-house annotation capacity.
By pairing a billion-scale labeled dataset with an openly released foundation model, Swin-BOB and UKBOB push 3D medical image segmentation toward the data-and-scale paradigm that has driven progress in natural-image and language models. The work establishes a reusable benchmark and pretrained backbone for the medical imaging community and demonstrates that automated labeling plus careful filtering can substitute for expensive manual annotation at scale. Its main limitations stem from its source cohort: the UK Biobank population skews toward older, predominantly European-ancestry adults, so generalization to pediatric, pathological, or demographically distinct populations should be validated before clinical use.
Bourigault, E., et al. (2025) UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation. IEEE International Conference on Computer Vision.
DOI: 10.48550/arXiv.2504.06908Bourigault, E., et al. (2025) UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation. IEEE International Conference on Computer Vision.
DOI: 10.1109/ICCV51701.2025.02006Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data