bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

STU-Net

Shanghai AI Laboratory

Scalable and transferable U-Net family (14M–1.4B parameters) for 3D medical image segmentation, supervised-pretrained on TotalSegmentator.

Released: April 2023
Parameters: 1.5 Billion

STU-Net is a family of scalable, transferable convolutional models for 3D medical image segmentation, introduced in April 2023 by researchers at Shanghai AI Laboratory and collaborating institutions (Ziyan Huang, Junjun He, Yu Qiao, and colleagues). It addresses a long-standing gap in medical imaging: while large-scale supervised pre-training had transformed natural-image and language tasks, segmentation models for CT and other volumetric modalities remained small and were typically trained from scratch on each new dataset.

The model builds directly on the widely used nnU-Net framework, whose self-configuring pipeline is a de facto standard for biomedical segmentation. The key contribution is making nnU-Net's convolutional blocks scalable, then systematically growing the network from 14 million up to 1.4 billion parameters—the largest medical image segmentation model reported at the time of release. By pre-training this family on TotalSegmentator, the largest public annotated CT dataset, the authors deliver checkpoints that can be applied off the shelf or fine-tuned, lowering the barrier to strong segmentation on new clinical targets.

STU-Net sits at the intersection of classical segmentation engineering and the foundation-model paradigm, demonstrating that the "scale plus pre-training" recipe transfers to dense volumetric prediction, not only to classification and generative tasks.

#Key Features

  • Scalable architecture: Refined nnU-Net convolutional blocks let the same design span four sizes—S (14.6M), B (58.3M), L (440.3M), and H (1.46B parameters)—matched to different compute budgets.
  • Joint depth-width scaling: Empirical study shows scaling network depth and width together is optimal, with larger models yielding consistent performance gains.
  • Large-scale supervised pre-training: All variants are pre-trained on TotalSegmentator (1,204 CT scans, 104 anatomical structures spanning organs, bones, muscles, and vessels).
  • Strong transferability: Evaluated on 14 downstream datasets for direct inference and 3 for fine-tuning, across multiple modalities and segmentation targets.
  • Open release: Code and all four pretrained checkpoints are released under the Apache 2.0 license.

#Technical Details

STU-Net is a fully convolutional encoder-decoder built on the nnU-Net framework, with the default residual blocks redesigned so that depth and width can be increased without breaking the self-configuring pipeline. Models range from STU-Net-S (14.6M parameters) to STU-Net-H (1,457M parameters). Pre-training is fully supervised for 4,000 epochs on TotalSegmentator—1,204 CT images annotated across 104 structures (27 organs, 59 bones, 10 muscles, 8 vessels)—using mirror data augmentation. The authors find that increasing model size improves accuracy on the upstream task and, importantly, yields better transfer: larger pretrained models reach higher segmentation accuracy on downstream datasets, including in limited-data fine-tuning regimes where data efficiency matters most.

#Applications

STU-Net targets researchers and clinical-imaging teams who need accurate 3D segmentation of anatomical structures and lesions in CT and related modalities. The pretrained checkpoints can be used for direct inference on TotalSegmentator-covered anatomy, as initialization for fine-tuning on new organs, tumors, or modalities, or as a strong backbone for benchmarking. Because it inherits nnU-Net's automatic configuration, it slots into existing segmentation workflows with minimal manual tuning, benefiting groups building radiology pipelines, surgical planning tools, and downstream quantitative analyses.

#Impact

STU-Net demonstrated that large-scale supervised pre-training scales effectively to volumetric medical segmentation, providing one of the first openly released billion-parameter segmentation backbones for the field. Its pretrained variants have since been adopted as baselines and initialization in subsequent benchmarking efforts, including the Touchstone and SegBook studies, and its Apache-2.0 release has made it a practical starting point for transfer learning. The work helped motivate the broader move toward reusable, pretrained segmentation foundation models rather than per-dataset training from scratch, though its CT-centric pre-training means downstream gains are strongest for anatomy and modalities close to the TotalSegmentator distribution.

Citation

STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training

Preprint

Huang, Z., et al. (2023) STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training. arXiv.org.

DOI: 10.48550/arXiv.2304.06716

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations151
Influential22
References46

GitHub

Stars370
Forks38
Open Issues25
Contributors3
Last Push1y ago
LanguagePython
LicenseApache-2.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
82Open
Usability — can I run it?100
Reproducibility — can I retrain it?72
Model Openness Framework
Class III
Open Model

Tags

cnnctfoundation_modelradiologysegmentationsupervisedtransfer_learningu_net

Resources

GitHub RepositoryResearch Paper