bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

SegVol

Beijing Academy of Artificial Intelligence

A promptable 3D foundation model for volumetric CT segmentation of 200+ anatomical categories using point, box, and text prompts.

Released: November 2023

SegVol is a universal, interactive foundation model for volumetric medical image segmentation, developed by researchers at the Beijing Academy of Artificial Intelligence (BAAI) and collaborators and introduced in November 2023. It tackles a long-standing bottleneck in 3D medical imaging: most segmentation models are trained for a single organ or task, requiring a new model and labeled dataset for every structure of interest. SegVol instead provides one model that can segment more than 200 anatomical categories across whole CT volumes, driven by user-supplied prompts.

The model's central novelty is its support for both spatial and semantic prompting. Users can specify a target with point or bounding-box prompts (in the style of the Segment Anything Model, SAM) or with free-text prompts naming an anatomical structure, and SegVol returns a 3D mask for the corresponding region. This makes it a 3D, text-aware analogue to interactive 2D segmentation models, addressing the gap that earlier promptable models such as SAM were designed for natural 2D images rather than volumetric medical scans.

SegVol was accepted as a spotlight at NeurIPS 2024. By releasing pretrained weights, training code, and a large curated dataset, the authors positioned it as an accessible base model for radiology segmentation that researchers can apply zero-shot or fine-tune for specific clinical tasks.

#Key Features

  • Universal coverage: A single model segments 200+ anatomical categories spanning organs, tissues, and lesions across whole-body CT, rather than one structure per model.
  • Spatial and semantic prompting: Accepts point, bounding-box, and free-text prompts, letting users either click/draw a region or simply name the anatomy to segment.
  • Zoom-out-zoom-in inference: A coarse-to-fine mechanism first localizes the target at low resolution, then refines the mask at high resolution, enabling precise volumetric inference without exhaustively processing every patch at full resolution.
  • Large-scale pretraining: The image encoder is self-supervised on roughly 90K unlabeled CT volumes, then jointly trained with 6K labeled volumes, giving strong transferable 3D representations.
  • Open release: Code, pretrained weights, and the M3D-Seg training collection are publicly available under an MIT license, with a hosted demo and ModelScope mirror.

#Technical Details

SegVol couples a 3D Vision Transformer (ViT) image encoder with a SAM-style promptable segmentation decoder, plus a text encoder that maps anatomical names into the prompt space for semantic segmentation. The ViT encoder is pretrained for 2,000 epochs on about 90,000 unlabeled CT volumes via self-supervision, then supervised on roughly 6,000 labeled volumes drawn from 25 public datasets aggregated into the M3D-Seg collection (5,772 3D images and 149,196 mask annotations spanning datasets such as BTCV, AMOS22, TotalSegmentator, KiTS, CHAOS, and the Medical Segmentation Decathlon). On a benchmark of 22 anatomical segmentation tasks, SegVol outperforms competing methods on 19 of them, with improvements of up to 37.24% in Dice score over the runner-up. The zoom-out-zoom-in strategy is what makes whole-volume inference tractable while preserving fine boundary detail.

#Applications

SegVol targets radiology and medical-imaging research where annotating 3D CT volumes is expensive and slow. Researchers can use it to generate masks for organs, tissues, and lesions across many anatomical targets from a single checkpoint, accelerating dataset annotation, organ measurement, and downstream pipelines such as treatment planning or quantitative imaging studies. The text-prompt interface is particularly useful for semantic segmentation when no manual click is available, and the spatial prompts support interactive correction for clinicians and annotators.

#Impact

SegVol helped establish promptable 3D foundation models as a practical direction for medical image segmentation, extending the SAM paradigm from 2D natural images to volumetric clinical scans while adding text-driven semantic control. Its NeurIPS 2024 spotlight, broad anatomical coverage, and fully open release of weights, code, and the M3D-Seg dataset have made it a widely referenced baseline and starting point for subsequent universal medical segmentation work. The main limitations are its focus on CT (rather than MRI or other modalities) and the usual caveat that prompt-driven outputs benefit from expert review before clinical use.

Citation

SegVol: Universal and Interactive Volumetric Medical Image Segmentation

Preprint

Du, Y., et al. (2023) SegVol: Universal and Interactive Volumetric Medical Image Segmentation. Neural Information Processing Systems.

DOI: 10.48550/arXiv.2311.13385

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations115
Influential17
References78

GitHub

Stars383
Forks38
Open Issues14
Contributors3
Last Push5mo ago
LanguagePython
LicenseMIT

HuggingFace

Downloads278
Likes13
Last Modified2y ago
Pipelineimage-feature-extraction

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
100Open
Usability — can I run it?100
Reproducibility — can I retrain it?100
Model Openness Framework
Class I
Open Science

Tags

ctfoundation_modelmultimodalradiologysegmentationself_supervisedvision_transformer

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDataset