Spark3D (S3D)

German Cancer Research Center (DKFZ) / Heidelberg University / Helmholtz Imaging / National Center for Tumor Diseases (NCT) Heidelberg / FLOY / Humanitas University

Masked-autoencoder foundation model that pre-trains a 3D Residual Encoder U-Net on roughly 39,000 brain MRIs for volumetric image segmentation.

Released: October 2024

Spark3D (S3D) is a self-supervised foundation model for 3D medical image segmentation, introduced in the CVPR 2025 paper "Revisiting MAE pre-training for 3D medical image segmentation" by Tassilo Wald and colleagues at the German Cancer Research Center (DKFZ) and collaborating institutions. While masked autoencoder (MAE) pre-training transformed 2D natural-image vision, attempts to carry it into volumetric medical imaging had repeatedly failed to beat the strong, dataset-adaptive nnU-Net baseline. S3D revisits that question with careful design and large-scale data, becoming the first MAE approach to consistently outperform nnU-Net on 3D segmentation.

The model adapts the MAE objective to 3D convolutional networks: a Residual Encoder U-Net is pre-trained to reconstruct heavily masked brain MRI volumes, learning transferable anatomical representations that are then fine-tuned for downstream segmentation tasks. Pre-training draws on roughly 39,000 3D brain MRI volumes, and the authors build a rigorous evaluation framework spanning five development and eight held-out testing segmentation datasets to avoid the overfitting-to-benchmark pitfalls common in prior self-supervised work.

S3D sits in the lineage of medical-imaging foundation models that pre-train once on large unlabeled corpora and adapt to many tasks, but it is distinctive in targeting CNN-based dense prediction rather than transformer feature extraction, and in being benchmarked against nnU-Net rather than weaker baselines.

Key Features

MAE adapted for 3D CNNs: Reformulates masked-image reconstruction for volumetric Residual Encoder U-Nets, the first such configuration to consistently beat the nnU-Net segmentation baseline.
Large-scale brain MRI pre-training: Trained on ~39,168 3D brain MRI volumes (T1, T2, T1-FLAIR, T2-FLAIR) drawn from 44+ centers and 9k+ patients.
Base and Large variants: Ships as S3D-B (Base, the recommended default) and S3D-L (Large), trading compute for modest additional accuracy.
Rigorous, leakage-aware evaluation: Assessed across 5 development and 8 testing datasets, separating model selection from final reporting.
Strong low-data transfer: With as few as 40 labeled training images, S3D-B nearly matches from-scratch nnU-Net trained on full datasets.

Technical Details

S3D pre-trains a ResEnc-L (large Residual Encoder) U-Net within the nnssl/nnU-Net framework using a Spark-style sparse masked-reconstruction objective adapted to 3D convolutions. The pre-training corpus comprises 39,168 brain MR images restricted to T1, T2, T1-FLAIR, and T2-FLAIR sequences, filtered from a proprietary collection of 44k volumes across more than 44 centers, 9k+ patients, and 10+ scanner types; this clinical data is not publicly released due to patient-privacy constraints. After fine-tuning, S3D-B improves over a fixed nnU-Net configuration by roughly +2.0 Dice (DSC) points averaged across 11 test datasets, achieves the best average rank among seven methods compared against prior SSL approaches (VoCo, VolumeFusion, Models Genesis), and demonstrates strong sample efficiency in low-data regimes. The code is released through the MIC-DKFZ nnssl framework (CC-BY-SA-4.0), with pre-trained checkpoints distributed via HuggingFace and auto-downloaded by the downstream fine-tuning pipeline.

Applications

S3D is designed for radiologists, neuroimaging researchers, and medical-imaging ML practitioners who need accurate volumetric segmentation of brain MRI — for example delineating tumors, lesions, or anatomical structures. Because pre-training yields transferable weights, teams can fine-tune S3D on their own labeled datasets and obtain segmentation gains over training from scratch, which is especially valuable when labeled data is scarce. The released checkpoints integrate directly into nnU-Net-style adaptation workflows, lowering the barrier to applying foundation-model pre-training in clinical research pipelines.

Impact

S3D is significant as the first work to demonstrate that properly configured MAE pre-training can consistently surpass the notoriously strong nnU-Net baseline in 3D medical image segmentation, settling a long-standing open question about whether self-supervised pre-training helps in this domain. Its careful, leakage-aware evaluation protocol and public nnssl codebase have influenced subsequent benchmarking efforts such as the OpenMind study, which extends the same framework to compare eight SSL methods across architectures. The principal limitation is that the largest gains depend on a proprietary clinical pre-training corpus that cannot be shared, so externally reproducible pre-training relies on smaller public datasets.

Citations

Revisiting MAE pre-training for 3D medical image segmentation

Wald, T., et al. (2024) Revisiting MAE pre-training for 3D medical image segmentation. Computer Vision and Pattern Recognition.

DOI: 10.1109/CVPR52734.2025.00489

Revisiting MAE pre-training for 3D medical image segmentation

Preprint

Wald, T., et al. (2024) Revisiting MAE pre-training for 3D medical image segmentation. Computer Vision and Pattern Recognition.

DOI: 10.48550/arXiv.2410.23132

Recent citations

Papers that recently cited this model.

Toward brain magnetic resonance imaging analysis intelligence: A review of federated learning and visual foundation models
Zhen Yu, Yang Liu, Qingchao Chen
Engineering applications of artificial intelligence · Aug 2026
0
MAE-UNETR++: Masked Autoencoder Pretraining for 3-D Lung Nodule Segmentation
Vinayak Savant, Yue Wang, J. Xuan
bioRxiv · Jun 2026
0
Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks
Jue Jiang, H. Veeraraghavan
May 2026
0

Top citations

The most-cited papers that cite this model.

MedDINOv3: How to adapt vision foundation models for medical image segmentation?
Yuheng Li, Yizhou Wu, Yuxiang Lai, et al.
arXiv.org · Sep 2025
22
An OpenMind for 3D Medical Vision Self-supervised Learning
Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi, et al.
IEEE International Conference on Computer Vision · Dec 2024
20Influential
General Methods Make Great Domain-specific Foundation Models: A Case-study on Fetal Ultrasound
Jakob Ambsdorf, Asbjørn Munk, S. Llambias, et al.
International Conference on Medical Image Computing and Computer-Assisted Intervention · Jun 2025
9
Comprehensive language-image pre-training for 3D medical image understanding
Tassilo Wald, I. Hamamci, Yuan Gao, et al.
arXiv.org · Oct 2025
8
Visual Instruction Pretraining for Domain-Specific Foundation Models
Yuxuan Li, Yicheng Zhang, Wenhao Tang, et al.
arXiv.org · Sep 2025
8

Citations

Total Citations39

Influential5

References63

GitHub

Stars157

Forks23

Open Issues12

Contributors61

Last Push9mo ago

LanguagePython

LicenseCC-BY-SA-4.0

HuggingFace

Downloads36

Likes1

Last Modified1y ago

Pipelineimage-feature-extraction

Fields of citing research

Computer Science97%
Medicine89%
Engineering40%
Biology11%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

45Partial

Usability — can I run it?49

Reproducibility — can I retrain it?46

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

MAE adapted for 3D CNNs: Reformulates masked-image reconstruction for volumetric Residual Encoder U-Nets, the first such configuration to consistently beat the nnU-Net segmentation baseline.

Large-scale brain MRI pre-training: Trained on ~39,168 3D brain MRI volumes (T1, T2, T1-FLAIR, T2-FLAIR) drawn from 44+ centers and 9k+ patients.

Base and Large variants: Ships as S3D-B (Base, the recommended default) and S3D-L (Large), trading compute for modest additional accuracy.

Rigorous, leakage-aware evaluation: Assessed across 5 development and 8 testing datasets, separating model selection from final reporting.

Strong low-data transfer: With as few as 40 labeled training images, S3D-B nearly matches from-scratch nnU-Net trained on full datasets.

Technical Details

Applications

Impact

Citations

Revisiting MAE pre-training for 3D medical image segmentation

Wald, T., et al. (2024) Revisiting MAE pre-training for 3D medical image segmentation. Computer Vision and Pattern Recognition.

DOI: 10.1109/CVPR52734.2025.00489

Revisiting MAE pre-training for 3D medical image segmentation

Preprint

Wald, T., et al. (2024) Revisiting MAE pre-training for 3D medical image segmentation. Computer Vision and Pattern Recognition.

DOI: 10.48550/arXiv.2410.23132

Recent citations

Papers that recently cited this model.

Spark3D (S3D)

#Key Features

#Technical Details

#Applications

#Impact

Citations

Revisiting MAE pre-training for 3D medical image segmentation

Revisiting MAE pre-training for 3D medical image segmentation

Recent citations

Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Spark3D (S3D)

#Key Features

#Technical Details

#Applications

#Impact

Citations

Revisiting MAE pre-training for 3D medical image segmentation

Revisiting MAE pre-training for 3D medical image segmentation

Recent citations

Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact