Swin-BOB

3D MRI organ segmentation foundation model built on Swin-UNETR and trained on the UKBOB whole-body dataset covering 72 organs and skeletal structures.

Released: April 2025

Parameters: 62 Million

Swin-BOB is a 3D medical image segmentation foundation model for whole-body magnetic resonance imaging (MRI), developed by the Visual Geometry Group at the University of Oxford and presented at ICCV 2025. It addresses a persistent bottleneck in medical imaging AI: the scarcity of large, densely labeled 3D training data. While natural-image models benefit from web-scale supervision, volumetric organ segmentation has traditionally relied on a few hundred hand-annotated scans, limiting both accuracy and generalization across scanners, protocols, and populations.

The model is trained on UKBOB (UK Biobank Organs and Bones), the largest labeled medical imaging dataset assembled to date. UKBOB contains 51,761 3D MRI samples (roughly 17.9 million 2D slices) and more than 1.37 billion 2D segmentation masks spanning 72 organs and skeletal structures, derived from the UK Biobank cohort. Because masks at this scale cannot be drawn by hand, the authors built an automated labeling pipeline with organ-aware filtering and a "Specialized Organ Labels Filtering" cleaning step, then validated quality against a manually annotated subset (UKBOB-manual, 300 MRIs with 11 abdominal classes).

Swin-BOB demonstrates that pretraining at this scale produces representations that transfer well beyond the source distribution. The released model serves both as a ready-to-use whole-body MRI organ segmenter and as a pretrained backbone that can be fine-tuned or adapted at test time to new anatomies, modalities, and clinical benchmarks.

Key Features

Billion-mask pretraining: Trained on UKBOB's 1.37 billion segmentation masks across 72 organs and bones, providing dense supervision orders of magnitude larger than prior 3D segmentation corpora.
Swin-UNETR backbone: Built on the hierarchical shifted-window Swin transformer encoder paired with a UNETR-style decoder, taking 96×96×96 voxel inputs and producing 72-channel organ predictions (~62M parameters).
Automated label cleaning: A Specialized Organ Labels Filtering pipeline removes noisy automated annotations, with manual validation on 300 MRIs confirming label fidelity.
Test-time adaptation: An entropy-based test-time adaptation (ETTA) module with deep supervision lets the model adapt to unseen domains without retraining, supporting zero-shot generalization to new scanners and tasks.

Technical Details

Swin-BOB uses a Swin-UNETR architecture with a hierarchical encoder (attention heads of 3, 6, 12, and 24 across stages; 7×7×7 windows with a shift of 3; base feature width of 48 scaling to 768 at the bottleneck) and approximately 62 million trainable parameters. Training combines large-scale semi-supervised learning on filtered UKBOB labels with deep supervision. On held-out benchmarks the model reports competitive whole-body and cross-domain results, including roughly 85.3% Dice on the BTCV abdominal benchmark (13 organs) and 91.2% Dice on BraTS23 brain tumor segmentation (3 classes), with the UKBOB-pretrained model improving over strong baselines by about 1.3% on BTCV and 0.4% on the BraTS brain MRI challenge. Code and pretrained weights are released under an MIT license, and the filtered labels are made available through the UK Biobank.

Applications

Swin-BOB targets researchers and clinicians working with volumetric MRI who need robust, automated organ and tissue delineation. Direct uses include whole-body organ quantification for population-scale studies (e.g., body composition, organ volumetry, and biomarker extraction from biobank-scale cohorts), and as a pretrained backbone for downstream tasks such as tumor segmentation, abdominal CT adaptation, and disease-specific analyses. The test-time adaptation module makes it practical to deploy on data from new scanners or institutions without collecting large new labeled sets, lowering the barrier for groups that lack in-house annotation capacity.

Impact

By pairing a billion-scale labeled dataset with an openly released foundation model, Swin-BOB and UKBOB push 3D medical image segmentation toward the data-and-scale paradigm that has driven progress in natural-image and language models. The work establishes a reusable benchmark and pretrained backbone for the medical imaging community and demonstrates that automated labeling plus careful filtering can substitute for expensive manual annotation at scale. Its main limitations stem from its source cohort: the UK Biobank population skews toward older, predominantly European-ancestry adults, so generalization to pediatric, pathological, or demographically distinct populations should be validated before clinical use.

Citations

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Preprint

Bourigault, E., et al. (2025) UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation. IEEE International Conference on Computer Vision.

DOI: 10.48550/arXiv.2504.06908

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Bourigault, E., et al. (2025) UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation. IEEE International Conference on Computer Vision.

DOI: 10.1109/ICCV51701.2025.02006

Recent citations

Papers that recently cited this model.

SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation
Hasan Faraz Khan, Noor Fatima, Muzammil Behzad
arXiv.org · Dec 2025
0
MedNeXt-v2: Scaling 3D ConvNeXts for Large-Scale Supervised Representation Learning in Medical Image Segmentation
Saikat Roy, Yannick Kirchhoff, Constantin Ulrich, et al.
arXiv.org · Dec 2025
1
Artificial intelligence in cardiovascular imaging: risks, mitigations and the path to safe implementation
J. P. Howard, Qiang Zhang, Ahmed M. Salih, et al.
Heart · Jun 2025
3

Top citations

The most-cited papers that cite this model.

X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
Emmanuelle Bourigault, A. Hamdi, A. Jamaludin
arXiv.org · Apr 2024
9
Artificial intelligence in cardiovascular imaging: risks, mitigations and the path to safe implementation
J. P. Howard, Qiang Zhang, Ahmed M. Salih, et al.
Heart · Jun 2025
3
MedNeXt-v2: Scaling 3D ConvNeXts for Large-Scale Supervised Representation Learning in Medical Image Segmentation
Saikat Roy, Yannick Kirchhoff, Constantin Ulrich, et al.
arXiv.org · Dec 2025
1
SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation
Hasan Faraz Khan, Noor Fatima, Muzammil Behzad
arXiv.org · Dec 2025
0

Citations

Total Citations4

Influential0

References89

GitHub

Stars50

Forks2

Open Issues4

Contributors2

Last Push3mo ago

LanguagePython

LicenseMIT

Fields of citing research

Computer Science100%
Medicine100%
Engineering50%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

64Partial

Usability — can I run it?77

Reproducibility — can I retrain it?49

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper Official Website

Key Features

Billion-mask pretraining: Trained on UKBOB's 1.37 billion segmentation masks across 72 organs and bones, providing dense supervision orders of magnitude larger than prior 3D segmentation corpora.

Swin-UNETR backbone: Built on the hierarchical shifted-window Swin transformer encoder paired with a UNETR-style decoder, taking 96×96×96 voxel inputs and producing 72-channel organ predictions (~62M parameters).

Automated label cleaning: A Specialized Organ Labels Filtering pipeline removes noisy automated annotations, with manual validation on 300 MRIs confirming label fidelity.

Test-time adaptation: An entropy-based test-time adaptation (ETTA) module with deep supervision lets the model adapt to unseen domains without retraining, supporting zero-shot generalization to new scanners and tasks.

Technical Details

Applications

Impact

Citations

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Preprint

Bourigault, E., et al. (2025) UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation. IEEE International Conference on Computer Vision.

DOI: 10.48550/arXiv.2504.06908

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Bourigault, E., et al. (2025) UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation. IEEE International Conference on Computer Vision.

DOI: 10.1109/ICCV51701.2025.02006

Swin-BOB

#Key Features

#Technical Details

#Applications

#Impact

Citations

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Swin-BOB

#Key Features

#Technical Details

#Applications

#Impact

Citations

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact