bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

Spark3D (S3D)

German Cancer Research Center (DKFZ) / Heidelberg University / Helmholtz Imaging / National Center for Tumor Diseases (NCT) Heidelberg / FLOY / Humanitas University

A masked-autoencoder foundation model that pre-trains a 3D Residual Encoder U-Net on ~39k brain MRIs to improve volumetric medical image segmentation.

Released: October 2024

Spark3D (S3D) is a self-supervised foundation model for 3D medical image segmentation, introduced in the CVPR 2025 paper "Revisiting MAE pre-training for 3D medical image segmentation" by Tassilo Wald and colleagues at the German Cancer Research Center (DKFZ) and collaborating institutions. While masked autoencoder (MAE) pre-training transformed 2D natural-image vision, attempts to carry it into volumetric medical imaging had repeatedly failed to beat the strong, dataset-adaptive nnU-Net baseline. S3D revisits that question with careful design and large-scale data, becoming the first MAE approach to consistently outperform nnU-Net on 3D segmentation.

The model adapts the MAE objective to 3D convolutional networks: a Residual Encoder U-Net is pre-trained to reconstruct heavily masked brain MRI volumes, learning transferable anatomical representations that are then fine-tuned for downstream segmentation tasks. Pre-training draws on roughly 39,000 3D brain MRI volumes, and the authors build a rigorous evaluation framework spanning five development and eight held-out testing segmentation datasets to avoid the overfitting-to-benchmark pitfalls common in prior self-supervised work.

S3D sits in the lineage of medical-imaging foundation models that pre-train once on large unlabeled corpora and adapt to many tasks, but it is distinctive in targeting CNN-based dense prediction rather than transformer feature extraction, and in being benchmarked against nnU-Net rather than weaker baselines.

#Key Features

  • MAE adapted for 3D CNNs: Reformulates masked-image reconstruction for volumetric Residual Encoder U-Nets, the first such configuration to consistently beat the nnU-Net segmentation baseline.
  • Large-scale brain MRI pre-training: Trained on ~39,168 3D brain MRI volumes (T1, T2, T1-FLAIR, T2-FLAIR) drawn from 44+ centers and 9k+ patients.
  • Base and Large variants: Ships as S3D-B (Base, the recommended default) and S3D-L (Large), trading compute for modest additional accuracy.
  • Rigorous, leakage-aware evaluation: Assessed across 5 development and 8 testing datasets, separating model selection from final reporting.
  • Strong low-data transfer: With as few as 40 labeled training images, S3D-B nearly matches from-scratch nnU-Net trained on full datasets.

#Technical Details

S3D pre-trains a ResEnc-L (large Residual Encoder) U-Net within the nnssl/nnU-Net framework using a Spark-style sparse masked-reconstruction objective adapted to 3D convolutions. The pre-training corpus comprises 39,168 brain MR images restricted to T1, T2, T1-FLAIR, and T2-FLAIR sequences, filtered from a proprietary collection of 44k volumes across more than 44 centers, 9k+ patients, and 10+ scanner types; this clinical data is not publicly released due to patient-privacy constraints. After fine-tuning, S3D-B improves over a fixed nnU-Net configuration by roughly +2.0 Dice (DSC) points averaged across 11 test datasets, achieves the best average rank among seven methods compared against prior SSL approaches (VoCo, VolumeFusion, Models Genesis), and demonstrates strong sample efficiency in low-data regimes. The code is released through the MIC-DKFZ nnssl framework (CC-BY-SA-4.0), with pre-trained checkpoints distributed via HuggingFace and auto-downloaded by the downstream fine-tuning pipeline.

#Applications

S3D is designed for radiologists, neuroimaging researchers, and medical-imaging ML practitioners who need accurate volumetric segmentation of brain MRI — for example delineating tumors, lesions, or anatomical structures. Because pre-training yields transferable weights, teams can fine-tune S3D on their own labeled datasets and obtain segmentation gains over training from scratch, which is especially valuable when labeled data is scarce. The released checkpoints integrate directly into nnU-Net-style adaptation workflows, lowering the barrier to applying foundation-model pre-training in clinical research pipelines.

#Impact

S3D is significant as the first work to demonstrate that properly configured MAE pre-training can consistently surpass the notoriously strong nnU-Net baseline in 3D medical image segmentation, settling a long-standing open question about whether self-supervised pre-training helps in this domain. Its careful, leakage-aware evaluation protocol and public nnssl codebase have influenced subsequent benchmarking efforts such as the OpenMind study, which extends the same framework to compare eight SSL methods across architectures. The principal limitation is that the largest gains depend on a proprietary clinical pre-training corpus that cannot be shared, so externally reproducible pre-training relies on smaller public datasets.

Citations

Revisiting MAE pre-training for 3D medical image segmentation

Wald, T., et al. (2024) Revisiting MAE pre-training for 3D medical image segmentation. Computer Vision and Pattern Recognition.

DOI: 10.1109/CVPR52734.2025.00489

Revisiting MAE pre-training for 3D medical image segmentation

Preprint

Wald, T., et al. (2024) Revisiting MAE pre-training for 3D medical image segmentation. Computer Vision and Pattern Recognition.

DOI: 10.48550/arXiv.2410.23132

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations34
Influential4
References63

GitHub

Stars154
Forks24
Open Issues11
Contributors62
Last Push8mo ago
LanguagePython
LicenseCC-BY-SA-4.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
45Partial
Usability — can I run it?49
Reproducibility — can I retrain it?46
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

brain_mricnnfoundation_modelmasked_autoencoderneuroimagingrepresentation_learningsegmentationself_supervisedtransfer_learningu_net

Resources

GitHub RepositoryResearch PaperHuggingFace Model