VISTA3D

Medical image segmentation foundation model for 3D CT and MRI, covering 127 anatomical classes automatically plus interactive point-prompt refinement.

Released: June 2024

VISTA3D (Versatile Imaging SegmenTation and Annotation) is a unified segmentation foundation model for 3D medical imaging, developed by NVIDIA as part of the open-source MONAI project. Volumetric segmentation of CT and MRI scans is a fundamental but labor-intensive step in radiology and medical image analysis: manually delineating organs, vessels, bones, and lesions slice-by-slice across hundreds of axial images is prohibitively slow, and most prior deep-learning segmenters are narrow expert models trained for a single anatomy or dataset. VISTA3D addresses this by providing one pretrained model that segments a broad set of anatomical structures out of the box while also supporting human-in-the-loop correction.

The model is, according to its authors, the first to achieve state-of-the-art performance in both 3D automatic segmentation (covering 127 anatomical classes) and 3D interactive segmentation, even when compared against top single-purpose 3D expert models on large, diverse benchmarks. It combines a "segment everything" automatic mode with point-prompt interactivity reminiscent of the Segment Anything Model (SAM), but operating natively in three dimensions rather than slice-by-slice.

First released as a preprint in June 2024 (arXiv:2406.05285) and subsequently accepted to CVPR 2025, VISTA3D is distributed through MONAI and packaged for deployment as the NVIDIA NV-Segment-CT and NV-Segment-CTMR models and as a NIM microservice.

Key Features

Broad automatic coverage: Segments 127 types of human anatomical structures (132 total classes including 7 tumor/lesion types such as lung nodules and liver, pancreatic, colon, kidney, and bone lesions) from a single forward pass, without per-anatomy retraining.
3D interactive refinement: Supports point-click prompts so users can correct or add segmentations for novel structures, enabling efficient human-in-the-loop annotation directly in 3D.
Supervoxel distillation: A novel 3D supervoxel method distills 2D pretrained (SAM-style) backbones into the 3D model, transferring zero-shot capability to volumes never seen during training.
Multi-modal support: The NV-Segment-CTMR variant extends coverage from CT to MRI, broadening applicability across imaging modalities.
Open and deployable: Released with code under Apache-2.0 and weights under the NVIDIA Open Model License, with ready-to-run MONAI bundles, HuggingFace checkpoints, and a NIM microservice for inference.

Technical Details

VISTA3D is built on a 3D segmentation pipeline that couples a convolutional encoder-decoder backbone with a multi-head, transformer-style class/prompt head supporting three workflows: segment everything, segment by class, and segment by point prompt. It was trained systematically on a curated corpus of 11,454 3D CT volumes spanning 127 anatomical structures and various lesions, with training supervision drawn from a mix of expert annotations, pseudo-labels generated by the TotalSegmentator model, and supervoxels derived using SAM pretrained weights. The interactive branch is trained to respond to 3D point prompts, allowing zero-shot segmentation of structures outside the labeled class set. On the TotalSegmentatorV2 test split, VISTA3D reports strong lobe-level lung performance (Dice ~0.95 left upper lobe, 0.94 left lower lobe, 0.88 right upper lobe, 0.92 right middle lobe, 0.94 right lower lobe), reflecting its competitiveness with specialized expert models. Inference runs in MONAI Core on NVIDIA Ampere and Hopper GPUs (tested on A100, H100, and L40).

Applications

VISTA3D targets radiology research, medical-imaging AI development, and annotation workflows where rapid, multi-organ 3D segmentation is needed. Researchers can use its automatic mode to bootstrap labels across large CT/MRI cohorts, then use point-prompt interactivity to refine or extend segmentations to structures and pathologies the model has not explicitly learned—dramatically reducing manual annotation effort. As a MONAI bundle and NVIDIA NIM microservice, it integrates into imaging pipelines for tasks such as organ volumetry, tumor delineation, treatment planning research, and dataset curation. The distributed weights are released for research use and explicitly not for clinical diagnosis.

Impact

By unifying broad automatic segmentation and interactive correction in a single 3D foundation model, VISTA3D advances the medical-imaging field's shift from many narrow expert segmenters toward general-purpose, promptable models—the 3D counterpart to the SAM paradigm that reshaped 2D natural-image segmentation. Its release through MONAI, with open code, downloadable checkpoints, and a deployable microservice, lowers the barrier for labs to adopt foundation-model segmentation and to fine-tune for new anatomies. Acceptance at CVPR 2025 and integration into NVIDIA's healthcare tooling signal meaningful uptake, positioning VISTA3D alongside other anatomical-segmentation foundation models as a practical backbone for 3D medical image analysis.

Citations

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Preprint

He, Y., et al. (2024) VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. Computer Vision and Pattern Recognition.

DOI: 10.48550/arXiv.2406.05285

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

He, Y., et al. (2024) VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. Computer Vision and Pattern Recognition.

DOI: 10.1109/CVPR52734.2025.01943

Recent citations

Papers that recently cited this model.

Towards Enhancing 3D Spatial Reasoning in Medical Multimodal Large Language Models
Zhuoyuan Fu, Zeshang Li, Yiqiong Zhang, et al.
Jul 2026
0
MamNet-PT: A Mamba-enhanced hybrid architecture with selective state-space modeling for uncertainty-aware brain tumor segmentation
Yu Sun, Yihang Qin
PLoS ONE · Jul 2026
0
Towards Autonomous and Auditable Medical Imaging Model Development
Shengyuan Liu, Jiaxuan Jiang, Boyun Zheng, et al.
Jul 2026
0

Top citations

The most-cited papers that cite this model.

MedSAM2: Segment Anything in 3D Medical Images and Videos
Jun Ma, Zongxin Yang, Sumin Kim, et al.
arXiv.org · Apr 2025
96
tUbe net: a generalisable deep learning tool for 3D vessel segmentation
N. Holroyd, Zhongwang Li, C. Walsh, et al.
bioRxiv · May 2025
9
Radiomics in Early Detection of Pancreatic Ductal Adenocarcinoma: A Close Look at Its Current Status and Challenges to Clinical Implementation
Hajra Arshad, Felipe Lopez-Ramirez, Florent Tixier, et al.
Canadian Association of Radiologists journal = Journal l'Association canadienne des radiologistes · Jul 2025
8
A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation
Yufan He, Pengfei Guo, Yucheng Tang, et al.
arXiv.org · Aug 2024
8
Early detection of pancreatic cancer on computed tomography: advancements with deep learning
Felipe Lopez-Ramirez, E. Syailendra, Florent Tixier, et al.
Radiology advances · Aug 2025
6

Citations

Total Citations77

Influential11

References61

GitHub

Stars292

Forks44

Open Issues8

Contributors9

Last Push19d ago

LanguagePython

LicenseApache-2.0

HuggingFace

Downloads7.3K

Likes23

Last Modified3mo ago

Pipelineimage-segmentation

Fields of citing research

Medicine96%
Computer Science95%
Engineering34%
Biology5%
Environmental Science1%
Physics1%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

73Open

Usability — can I run it?76

Reproducibility — can I retrain it?66

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model HuggingFace Model

Key Features

Broad automatic coverage: Segments 127 types of human anatomical structures (132 total classes including 7 tumor/lesion types such as lung nodules and liver, pancreatic, colon, kidney, and bone lesions) from a single forward pass, without per-anatomy retraining.

3D interactive refinement: Supports point-click prompts so users can correct or add segmentations for novel structures, enabling efficient human-in-the-loop annotation directly in 3D.

Supervoxel distillation: A novel 3D supervoxel method distills 2D pretrained (SAM-style) backbones into the 3D model, transferring zero-shot capability to volumes never seen during training.

Multi-modal support: The NV-Segment-CTMR variant extends coverage from CT to MRI, broadening applicability across imaging modalities.

Open and deployable: Released with code under Apache-2.0 and weights under the NVIDIA Open Model License, with ready-to-run MONAI bundles, HuggingFace checkpoints, and a NIM microservice for inference.

Technical Details

Applications

Impact

Citations

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Preprint

He, Y., et al. (2024) VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. Computer Vision and Pattern Recognition.

DOI: 10.48550/arXiv.2406.05285

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

He, Y., et al. (2024) VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. Computer Vision and Pattern Recognition.

DOI: 10.1109/CVPR52734.2025.01943

Recent citations

Papers that recently cited this model.

Towards Enhancing 3D Spatial Reasoning in Medical Multimodal Large Language Models

Zhuoyuan Fu, Zeshang Li, Yiqiong Zhang, et al.

Jul 2026

MamNet-PT: A Mamba-enhanced hybrid architecture with selective state-space modeling for uncertainty-aware brain tumor segmentation

Yu Sun, Yihang Qin

PLoS ONE · Jul 2026

Towards Autonomous and Auditable Medical Imaging Model Development

Shengyuan Liu, Jiaxuan Jiang, Boyun Zheng, et al.

Jul 2026

VISTA3D

#Key Features

#Technical Details

#Applications

#Impact

Citations

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Recent citations

Towards Enhancing 3D Spatial Reasoning in Medical Multimodal Large Language Models

Towards Autonomous and Auditable Medical Imaging Model Development

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

VISTA3D

#Key Features

#Technical Details

#Applications

#Impact

Citations

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Recent citations

Towards Enhancing 3D Spatial Reasoning in Medical Multimodal Large Language Models

Towards Autonomous and Auditable Medical Imaging Model Development

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact