Scalable and transferable U-Net family (14M–1.4B parameters) for 3D medical image segmentation, supervised-pretrained on TotalSegmentator.
STU-Net is a family of scalable, transferable convolutional models for 3D medical image segmentation, introduced in April 2023 by researchers at Shanghai AI Laboratory and collaborating institutions (Ziyan Huang, Junjun He, Yu Qiao, and colleagues). It addresses a long-standing gap in medical imaging: while large-scale supervised pre-training had transformed natural-image and language tasks, segmentation models for CT and other volumetric modalities remained small and were typically trained from scratch on each new dataset.
The model builds directly on the widely used nnU-Net framework, whose self-configuring pipeline is a de facto standard for biomedical segmentation. The key contribution is making nnU-Net's convolutional blocks scalable, then systematically growing the network from 14 million up to 1.4 billion parameters—the largest medical image segmentation model reported at the time of release. By pre-training this family on TotalSegmentator, the largest public annotated CT dataset, the authors deliver checkpoints that can be applied off the shelf or fine-tuned, lowering the barrier to strong segmentation on new clinical targets.
STU-Net sits at the intersection of classical segmentation engineering and the foundation-model paradigm, demonstrating that the "scale plus pre-training" recipe transfers to dense volumetric prediction, not only to classification and generative tasks.
STU-Net is a fully convolutional encoder-decoder built on the nnU-Net framework, with the default residual blocks redesigned so that depth and width can be increased without breaking the self-configuring pipeline. Models range from STU-Net-S (14.6M parameters) to STU-Net-H (1,457M parameters). Pre-training is fully supervised for 4,000 epochs on TotalSegmentator—1,204 CT images annotated across 104 structures (27 organs, 59 bones, 10 muscles, 8 vessels)—using mirror data augmentation. The authors find that increasing model size improves accuracy on the upstream task and, importantly, yields better transfer: larger pretrained models reach higher segmentation accuracy on downstream datasets, including in limited-data fine-tuning regimes where data efficiency matters most.
STU-Net targets researchers and clinical-imaging teams who need accurate 3D segmentation of anatomical structures and lesions in CT and related modalities. The pretrained checkpoints can be used for direct inference on TotalSegmentator-covered anatomy, as initialization for fine-tuning on new organs, tumors, or modalities, or as a strong backbone for benchmarking. Because it inherits nnU-Net's automatic configuration, it slots into existing segmentation workflows with minimal manual tuning, benefiting groups building radiology pipelines, surgical planning tools, and downstream quantitative analyses.
STU-Net demonstrated that large-scale supervised pre-training scales effectively to volumetric medical segmentation, providing one of the first openly released billion-parameter segmentation backbones for the field. Its pretrained variants have since been adopted as baselines and initialization in subsequent benchmarking efforts, including the Touchstone and SegBook studies, and its Apache-2.0 release has made it a practical starting point for transfer learning. The work helped motivate the broader move toward reusable, pretrained segmentation foundation models rather than per-dataset training from scratch, though its CT-centric pre-training means downstream gains are strongest for anatomy and modalities close to the TotalSegmentator distribution.
Huang, Z., et al. (2023) STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training. arXiv.org.
DOI: 10.48550/arXiv.2304.06716Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data