SegVol

Beijing Academy of Artificial Intelligence

Promptable 3D foundation model for volumetric CT segmentation, covering over 200 anatomical categories through point, box, and free-text prompts.

Released: November 2023

SegVol is a universal, interactive foundation model for volumetric medical image segmentation, developed by researchers at the Beijing Academy of Artificial Intelligence (BAAI) and collaborators and introduced in November 2023. It tackles a long-standing bottleneck in 3D medical imaging: most segmentation models are trained for a single organ or task, requiring a new model and labeled dataset for every structure of interest. SegVol instead provides one model that can segment more than 200 anatomical categories across whole CT volumes, driven by user-supplied prompts.

The model's central novelty is its support for both spatial and semantic prompting. Users can specify a target with point or bounding-box prompts (in the style of the Segment Anything Model, SAM) or with free-text prompts naming an anatomical structure, and SegVol returns a 3D mask for the corresponding region. This makes it a 3D, text-aware analogue to interactive 2D segmentation models, addressing the gap that earlier promptable models such as SAM were designed for natural 2D images rather than volumetric medical scans.

SegVol was accepted as a spotlight at NeurIPS 2024. By releasing pretrained weights, training code, and a large curated dataset, the authors positioned it as an accessible base model for radiology segmentation that researchers can apply zero-shot or fine-tune for specific clinical tasks.

Key Features

Universal coverage: A single model segments 200+ anatomical categories spanning organs, tissues, and lesions across whole-body CT, rather than one structure per model.
Spatial and semantic prompting: Accepts point, bounding-box, and free-text prompts, letting users either click/draw a region or simply name the anatomy to segment.
Zoom-out-zoom-in inference: A coarse-to-fine mechanism first localizes the target at low resolution, then refines the mask at high resolution, enabling precise volumetric inference without exhaustively processing every patch at full resolution.
Large-scale pretraining: The image encoder is self-supervised on roughly 90K unlabeled CT volumes, then jointly trained with 6K labeled volumes, giving strong transferable 3D representations.
Open release: Code, pretrained weights, and the M3D-Seg training collection are publicly available under an MIT license, with a hosted demo and ModelScope mirror.

Technical Details

SegVol couples a 3D Vision Transformer (ViT) image encoder with a SAM-style promptable segmentation decoder, plus a text encoder that maps anatomical names into the prompt space for semantic segmentation. The ViT encoder is pretrained for 2,000 epochs on about 90,000 unlabeled CT volumes via self-supervision, then supervised on roughly 6,000 labeled volumes drawn from 25 public datasets aggregated into the M3D-Seg collection (5,772 3D images and 149,196 mask annotations spanning datasets such as BTCV, AMOS22, TotalSegmentator, KiTS, CHAOS, and the Medical Segmentation Decathlon). On a benchmark of 22 anatomical segmentation tasks, SegVol outperforms competing methods on 19 of them, with improvements of up to 37.24% in Dice score over the runner-up. The zoom-out-zoom-in strategy is what makes whole-volume inference tractable while preserving fine boundary detail.

Applications

SegVol targets radiology and medical-imaging research where annotating 3D CT volumes is expensive and slow. Researchers can use it to generate masks for organs, tissues, and lesions across many anatomical targets from a single checkpoint, accelerating dataset annotation, organ measurement, and downstream pipelines such as treatment planning or quantitative imaging studies. The text-prompt interface is particularly useful for semantic segmentation when no manual click is available, and the spatial prompts support interactive correction for clinicians and annotators.

Impact

SegVol helped establish promptable 3D foundation models as a practical direction for medical image segmentation, extending the SAM paradigm from 2D natural images to volumetric clinical scans while adding text-driven semantic control. Its NeurIPS 2024 spotlight, broad anatomical coverage, and fully open release of weights, code, and the M3D-Seg dataset have made it a widely referenced baseline and starting point for subsequent universal medical segmentation work. The main limitations are its focus on CT (rather than MRI or other modalities) and the usual caveat that prompt-driven outputs benefit from expert review before clinical use.

Citation

SegVol: Universal and Interactive Volumetric Medical Image Segmentation

Preprint

Du, Y., et al. (2023) SegVol: Universal and Interactive Volumetric Medical Image Segmentation. Neural Information Processing Systems.

DOI: 10.48550/arXiv.2311.13385

Recent citations

Papers that recently cited this model.

UniMedSeg: Unified In-Context Learning for Multi-Paradigm 2D/3D Medical Image Segmentation
Yunzhou Li, Jiesi Hu, Yanwu Yang, et al.
Jul 2026
0
Decouple and Reason: Anatomically Guided Two-Stage Voxel-Level Grounding of Free-Text Findings in 3D Chest CT
Kwang-Hyun Uhm, Inhwa Son, Sung-Jea Ko
Jul 2026
0
Language-Guided Segmentation of Medical Images: A Review of Foundation Models
Saqib Qamar
Bioengineering · Jul 2026
0

Top citations

The most-cited papers that cite this model.

Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions
Yichi Zhang, Zhenrong Shen, Rushi Jiao
Comput. Biol. Medicine · Jan 2024
331
SAM-Med3D: Towards General-Purpose Segmentation Models for Volumetric Medical Images
Haoyu Wang, Sizheng Guo, Jin Ye, et al.
ECCV Workshops · Oct 2023
173
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan, Seowung Leem, Kyle B. See, et al.
IEEE Reviews in Biomedical Engineering · Jun 2024
115
MedSAM2: Segment Anything in 3D Medical Images and Videos
Jun Ma, Zongxin Yang, Sumin Kim, et al.
arXiv.org · Apr 2025
96
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey
Qika Lin, Yifan Zhu, Xin Mei, et al.
Information Fusion · Aug 2024
85

Citations

Total Citations122

Influential18

References78

GitHub

Stars386

Forks39

Open Issues14

Contributors3

Last Push6mo ago

LanguagePython

LicenseMIT

HuggingFace

Downloads772

Likes13

Last Modified2y ago

Pipelinefeature-extraction

Fields of citing research

Computer Science97%
Medicine96%
Engineering30%
Biology3%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

100Open

Usability — can I run it?100

Reproducibility — can I retrain it?100

Model Openness Framework

Class I

Open Science

Resources

GitHub Repository Research Paper HuggingFace Model Dataset

Key Features

Universal coverage: A single model segments 200+ anatomical categories spanning organs, tissues, and lesions across whole-body CT, rather than one structure per model.

Spatial and semantic prompting: Accepts point, bounding-box, and free-text prompts, letting users either click/draw a region or simply name the anatomy to segment.

Zoom-out-zoom-in inference: A coarse-to-fine mechanism first localizes the target at low resolution, then refines the mask at high resolution, enabling precise volumetric inference without exhaustively processing every patch at full resolution.

Large-scale pretraining: The image encoder is self-supervised on roughly 90K unlabeled CT volumes, then jointly trained with 6K labeled volumes, giving strong transferable 3D representations.

Open release: Code, pretrained weights, and the M3D-Seg training collection are publicly available under an MIT license, with a hosted demo and ModelScope mirror.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

UniMedSeg: Unified In-Context Learning for Multi-Paradigm 2D/3D Medical Image Segmentation

Yunzhou Li, Jiesi Hu, Yanwu Yang, et al.

Jul 2026

Decouple and Reason: Anatomically Guided Two-Stage Voxel-Level Grounding of Free-Text Findings in 3D Chest CT

Kwang-Hyun Uhm, Inhwa Son, Sung-Jea Ko

Jul 2026

Language-Guided Segmentation of Medical Images: A Review of Foundation Models

Saqib Qamar

Bioengineering · Jul 2026

SegVol

#Key Features

#Technical Details

#Applications

#Impact

Citation

SegVol: Universal and Interactive Volumetric Medical Image Segmentation

Recent citations

UniMedSeg: Unified In-Context Learning for Multi-Paradigm 2D/3D Medical Image Segmentation

Decouple and Reason: Anatomically Guided Two-Stage Voxel-Level Grounding of Free-Text Findings in 3D Chest CT

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

SegVol

#Key Features

#Technical Details

#Applications

#Impact

Citation

SegVol: Universal and Interactive Volumetric Medical Image Segmentation

Recent citations

UniMedSeg: Unified In-Context Learning for Multi-Paradigm 2D/3D Medical Image Segmentation

Decouple and Reason: Anatomically Guided Two-Stage Voxel-Level Grounding of Free-Text Findings in 3D Chest CT

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact