NVIDIA/MONAI 3D medical image segmentation foundation model for CT and MRI, supporting automatic segmentation of 127 anatomical classes plus interactive point-prompt refinement.
VISTA3D (Versatile Imaging SegmenTation and Annotation) is a unified segmentation foundation model for 3D medical imaging, developed by NVIDIA as part of the open-source MONAI project. Volumetric segmentation of CT and MRI scans is a fundamental but labor-intensive step in radiology and medical image analysis: manually delineating organs, vessels, bones, and lesions slice-by-slice across hundreds of axial images is prohibitively slow, and most prior deep-learning segmenters are narrow expert models trained for a single anatomy or dataset. VISTA3D addresses this by providing one pretrained model that segments a broad set of anatomical structures out of the box while also supporting human-in-the-loop correction.
The model is, according to its authors, the first to achieve state-of-the-art performance in both 3D automatic segmentation (covering 127 anatomical classes) and 3D interactive segmentation, even when compared against top single-purpose 3D expert models on large, diverse benchmarks. It combines a "segment everything" automatic mode with point-prompt interactivity reminiscent of the Segment Anything Model (SAM), but operating natively in three dimensions rather than slice-by-slice.
First released as a preprint in June 2024 (arXiv:2406.05285) and subsequently accepted to CVPR 2025, VISTA3D is distributed through MONAI and packaged for deployment as the NVIDIA NV-Segment-CT and NV-Segment-CTMR models and as a NIM microservice.
VISTA3D is built on a 3D segmentation pipeline that couples a convolutional encoder-decoder backbone with a multi-head, transformer-style class/prompt head supporting three workflows: segment everything, segment by class, and segment by point prompt. It was trained systematically on a curated corpus of 11,454 3D CT volumes spanning 127 anatomical structures and various lesions, with training supervision drawn from a mix of expert annotations, pseudo-labels generated by the TotalSegmentator model, and supervoxels derived using SAM pretrained weights. The interactive branch is trained to respond to 3D point prompts, allowing zero-shot segmentation of structures outside the labeled class set. On the TotalSegmentatorV2 test split, VISTA3D reports strong lobe-level lung performance (Dice ~0.95 left upper lobe, 0.94 left lower lobe, 0.88 right upper lobe, 0.92 right middle lobe, 0.94 right lower lobe), reflecting its competitiveness with specialized expert models. Inference runs in MONAI Core on NVIDIA Ampere and Hopper GPUs (tested on A100, H100, and L40).
VISTA3D targets radiology research, medical-imaging AI development, and annotation workflows where rapid, multi-organ 3D segmentation is needed. Researchers can use its automatic mode to bootstrap labels across large CT/MRI cohorts, then use point-prompt interactivity to refine or extend segmentations to structures and pathologies the model has not explicitly learned—dramatically reducing manual annotation effort. As a MONAI bundle and NVIDIA NIM microservice, it integrates into imaging pipelines for tasks such as organ volumetry, tumor delineation, treatment planning research, and dataset curation. The distributed weights are released for research use and explicitly not for clinical diagnosis.
By unifying broad automatic segmentation and interactive correction in a single 3D foundation model, VISTA3D advances the medical-imaging field's shift from many narrow expert segmenters toward general-purpose, promptable models—the 3D counterpart to the SAM paradigm that reshaped 2D natural-image segmentation. Its release through MONAI, with open code, downloadable checkpoints, and a deployable microservice, lowers the barrier for labs to adopt foundation-model segmentation and to fine-tune for new anatomies. Acceptance at CVPR 2025 and integration into NVIDIA's healthcare tooling signal meaningful uptake, positioning VISTA3D alongside other anatomical-segmentation foundation models as a practical backbone for 3D medical image analysis.
He, Y., et al. (2024) VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. Computer Vision and Pattern Recognition.
DOI: 10.48550/arXiv.2406.05285He, Y., et al. (2024) VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. Computer Vision and Pattern Recognition.
DOI: 10.1109/CVPR52734.2025.01943Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data