bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

VISTA3D

NVIDIA

NVIDIA/MONAI 3D medical image segmentation foundation model for CT and MRI, supporting automatic segmentation of 127 anatomical classes plus interactive point-prompt refinement.

Released: June 2024

VISTA3D (Versatile Imaging SegmenTation and Annotation) is a unified segmentation foundation model for 3D medical imaging, developed by NVIDIA as part of the open-source MONAI project. Volumetric segmentation of CT and MRI scans is a fundamental but labor-intensive step in radiology and medical image analysis: manually delineating organs, vessels, bones, and lesions slice-by-slice across hundreds of axial images is prohibitively slow, and most prior deep-learning segmenters are narrow expert models trained for a single anatomy or dataset. VISTA3D addresses this by providing one pretrained model that segments a broad set of anatomical structures out of the box while also supporting human-in-the-loop correction.

The model is, according to its authors, the first to achieve state-of-the-art performance in both 3D automatic segmentation (covering 127 anatomical classes) and 3D interactive segmentation, even when compared against top single-purpose 3D expert models on large, diverse benchmarks. It combines a "segment everything" automatic mode with point-prompt interactivity reminiscent of the Segment Anything Model (SAM), but operating natively in three dimensions rather than slice-by-slice.

First released as a preprint in June 2024 (arXiv:2406.05285) and subsequently accepted to CVPR 2025, VISTA3D is distributed through MONAI and packaged for deployment as the NVIDIA NV-Segment-CT and NV-Segment-CTMR models and as a NIM microservice.

#Key Features

  • Broad automatic coverage: Segments 127 types of human anatomical structures (132 total classes including 7 tumor/lesion types such as lung nodules and liver, pancreatic, colon, kidney, and bone lesions) from a single forward pass, without per-anatomy retraining.
  • 3D interactive refinement: Supports point-click prompts so users can correct or add segmentations for novel structures, enabling efficient human-in-the-loop annotation directly in 3D.
  • Supervoxel distillation: A novel 3D supervoxel method distills 2D pretrained (SAM-style) backbones into the 3D model, transferring zero-shot capability to volumes never seen during training.
  • Multi-modal support: The NV-Segment-CTMR variant extends coverage from CT to MRI, broadening applicability across imaging modalities.
  • Open and deployable: Released with code under Apache-2.0 and weights under the NVIDIA Open Model License, with ready-to-run MONAI bundles, HuggingFace checkpoints, and a NIM microservice for inference.

#Technical Details

VISTA3D is built on a 3D segmentation pipeline that couples a convolutional encoder-decoder backbone with a multi-head, transformer-style class/prompt head supporting three workflows: segment everything, segment by class, and segment by point prompt. It was trained systematically on a curated corpus of 11,454 3D CT volumes spanning 127 anatomical structures and various lesions, with training supervision drawn from a mix of expert annotations, pseudo-labels generated by the TotalSegmentator model, and supervoxels derived using SAM pretrained weights. The interactive branch is trained to respond to 3D point prompts, allowing zero-shot segmentation of structures outside the labeled class set. On the TotalSegmentatorV2 test split, VISTA3D reports strong lobe-level lung performance (Dice ~0.95 left upper lobe, 0.94 left lower lobe, 0.88 right upper lobe, 0.92 right middle lobe, 0.94 right lower lobe), reflecting its competitiveness with specialized expert models. Inference runs in MONAI Core on NVIDIA Ampere and Hopper GPUs (tested on A100, H100, and L40).

#Applications

VISTA3D targets radiology research, medical-imaging AI development, and annotation workflows where rapid, multi-organ 3D segmentation is needed. Researchers can use its automatic mode to bootstrap labels across large CT/MRI cohorts, then use point-prompt interactivity to refine or extend segmentations to structures and pathologies the model has not explicitly learned—dramatically reducing manual annotation effort. As a MONAI bundle and NVIDIA NIM microservice, it integrates into imaging pipelines for tasks such as organ volumetry, tumor delineation, treatment planning research, and dataset curation. The distributed weights are released for research use and explicitly not for clinical diagnosis.

#Impact

By unifying broad automatic segmentation and interactive correction in a single 3D foundation model, VISTA3D advances the medical-imaging field's shift from many narrow expert segmenters toward general-purpose, promptable models—the 3D counterpart to the SAM paradigm that reshaped 2D natural-image segmentation. Its release through MONAI, with open code, downloadable checkpoints, and a deployable microservice, lowers the barrier for labs to adopt foundation-model segmentation and to fine-tune for new anatomies. Acceptance at CVPR 2025 and integration into NVIDIA's healthcare tooling signal meaningful uptake, positioning VISTA3D alongside other anatomical-segmentation foundation models as a practical backbone for 3D medical image analysis.

Citations

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Preprint

He, Y., et al. (2024) VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. Computer Vision and Pattern Recognition.

DOI: 10.48550/arXiv.2406.05285

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

He, Y., et al. (2024) VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. Computer Vision and Pattern Recognition.

DOI: 10.1109/CVPR52734.2025.01943

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations66
Influential9
References61

GitHub

Stars284
Forks43
Open Issues8
Contributors9
Last Push7d ago
LanguagePython
LicenseApache-2.0

HuggingFace

Downloads395
Likes20
Last Modified2mo ago
Pipelineimage-segmentation

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
73Open
Usability — can I run it?76
Reproducibility — can I retrain it?66
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cnnctfoundation_modelinteractive_segmentationmrisegmentationtransformerzero_shot

Resources

GitHub RepositoryResearch PaperHuggingFace ModelHuggingFace Model