Medical SAM 2

University of Oxford / National University of Singapore

SAM2-based foundation model that segments 2D and 3D medical images by treating volumes and image sets as video object tracking.

Released: August 2024

Medical SAM 2 (MedSAM-2) is a promptable segmentation foundation model that adapts Meta's Segment Anything Model 2 (SAM 2) to medical imaging. Its central idea is to reframe both 2D and 3D medical segmentation as a video object tracking problem: a 3D volume is processed as a sequence of frames, and even an unordered collection of unrelated 2D images can be treated as a pseudo-video. This unifies what are usually separate 2D and 3D segmentation pipelines under a single architecture and inference paradigm.

The model was developed by Jiayuan Zhu, Abdullah Hamdi, Yunli Qi, Yueming Jin, and Junde Wu at the University of Oxford and the National University of Singapore, with the work first posted to arXiv in August 2024. It addresses a persistent limitation of SAM-style models in medicine: out-of-the-box SAM and SAM 2 perform inconsistently on medical modalities such as CT, MRI, ultrasound, and fundus imaging, and propagating a single prompt across an entire 3D scan or image set is non-trivial.

MedSAM-2 sits alongside other SAM derivatives for medicine (such as MedSAM v1, SAM-Med2D, and the later bowang-lab MedSAM2) but is distinguished by its memory-driven, tracking-based formulation and its "One-Prompt Segmentation" capability, which lets a single annotated example drive segmentation of many subsequent images.

Key Features

Video-style unified segmentation: Treats 2D image sets and 3D volumes alike as frame sequences, applying SAM 2's video tracking pipeline so the same model handles both task types without architecture changes.
Self-sorting memory bank: A novel memory mechanism dynamically selects stored embeddings based on confidence and feature dissimilarity rather than temporal order, improving robustness when propagating segmentations through a volume.
One-Prompt Segmentation: A single user prompt on one image can be propagated to segment many additional images that share no temporal relationship, reducing per-image annotation burden.
Modality-agnostic prompting: Supports interactive bounding-box and point prompts across CT, MRI, ultrasound, fundus, microscopy, and other modalities.
Open weights and code: Released under Apache-2.0 with pretrained checkpoints distributed via Hugging Face.

Technical Details

MedSAM-2 builds directly on the SAM 2 backbone, which couples a Hiera image encoder with a memory attention module and a streaming memory bank. The authors fine-tune this pipeline for medical data and replace the default temporal memory with their self-sorting memory bank, which curates the most informative embeddings to condition predictions on later frames. Released checkpoints are fine-tuned with public datasets including REFUGE (optic cup, 2D fundus) and BTCV (abdominal multi-organ, 3D CT). Evaluation spans roughly 25 segmentation tasks across more than a dozen benchmarks, covering abdominal organs, kidney and liver tumors, breast and nasopharynx cancer, vestibular schwannoma, mediastinal lymph nodes, cerebral and coronary arteries, white blood cells, retinal vessels, and mandibles. Across these 2D and 3D tasks the paper reports state-of-the-art or competitive Dice scores relative to SAM, SAM 2, MedSAM, and specialist segmentation networks.

Applications

MedSAM-2 targets researchers and clinicians who need to delineate anatomical structures, lesions, and tumors across diverse imaging modalities without training a bespoke model for each task. Its tracking-based design is well suited to volumetric annotation in radiology, where a clinician can prompt a single slice and propagate the contour through an entire CT or MRI scan, and its One-Prompt mode supports efficient batch labeling of large 2D image collections in pathology, ophthalmology, and microscopy. The model is most useful as an interactive annotation accelerator and a strong baseline for medical segmentation pipelines.

Impact

By recasting medical segmentation as video tracking and adding a memory bank tailored to non-temporal medical data, MedSAM-2 demonstrated that SAM 2 can be adapted into a general-purpose, promptable tool for both 2D and 3D imaging. Released openly with code and weights, it became a widely cited reference point in the rapidly growing family of SAM-based medical segmentation models and helped popularize the volume-as-video framing. As a preprint built on a fast-moving foundation, its reported benchmarks should be read as a snapshot, and like other interactive SAM derivatives it still depends on user prompts and can struggle on modalities far from its fine-tuning data.

Citation

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Preprint

Zhu, J., et al. (2024) Medical SAM 2: Segment medical images as video via Segment Anything Model 2. arXiv.org.

DOI: 10.48550/arXiv.2408.00874

Recent citations

Papers that recently cited this model.

UniMedSeg: Unified In-Context Learning for Multi-Paradigm 2D/3D Medical Image Segmentation
Yunzhou Li, Jiesi Hu, Yanwu Yang, et al.
Jul 2026
0
Higher-Order Cell Tracking Transformer
Jordão Bragantini, Ilan Theodoro, Loic A. Royer
Jul 2026
0
LSR-Diff: A Diffusion Model Synthesizing Level Set Representations for Reliable Segmentation of Medical Images With Ambiguous Edges
Wenbo Gao, Haoyu Cao, J. Cheung, et al.
IEEE Transactions on Image Processing · Jul 2026
0

Top citations

The most-cited papers that cite this model.

TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI.
Tugba Akinci D’Antonoli, Lucas K. Berger, A. K. Indrakanti, et al.
Radiology · May 2024
112
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Chaoning Zhang, Sheng Zheng, Chenghao Li, et al.
arXiv.org · May 2023
96Influential
Segment Anything in Medical Images and Videos: Benchmark and Deployment
Jun Ma, Sumin Kim, Feifei Li, et al.
arXiv.org · Aug 2024
72
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Haofeng Liu, Erli Zhang, Junde Wu, et al.
arXiv.org · Aug 2024
54
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
Ziyue Wang, Junde Wu, Linghan Cai, et al.
Mar 2025
52

Citations

Total Citations257

Influential16

References52

GitHub

Stars932

Forks138

Open Issues51

Contributors5

Last Push1y ago

LanguagePython

LicenseApache-2.0

HuggingFace

Downloads0

Likes5

Last Modified1y ago

Fields of citing research

Computer Science93%
Medicine81%
Engineering43%
Biology4%
Environmental Science3%
Agricultural and Food Sciences1%
Mathematics0%
Geography0%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

74Open

Usability — can I run it?93

Reproducibility — can I retrain it?50

Model Openness Framework

Class III

Open Model

Resources

GitHub Repository Research Paper Official Website HuggingFace Model

Key Features

Video-style unified segmentation: Treats 2D image sets and 3D volumes alike as frame sequences, applying SAM 2's video tracking pipeline so the same model handles both task types without architecture changes.

Self-sorting memory bank: A novel memory mechanism dynamically selects stored embeddings based on confidence and feature dissimilarity rather than temporal order, improving robustness when propagating segmentations through a volume.

One-Prompt Segmentation: A single user prompt on one image can be propagated to segment many additional images that share no temporal relationship, reducing per-image annotation burden.

Modality-agnostic prompting: Supports interactive bounding-box and point prompts across CT, MRI, ultrasound, fundus, microscopy, and other modalities.

Open weights and code: Released under Apache-2.0 with pretrained checkpoints distributed via Hugging Face.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

UniMedSeg: Unified In-Context Learning for Multi-Paradigm 2D/3D Medical Image Segmentation

Yunzhou Li, Jiesi Hu, Yanwu Yang, et al.

Jul 2026

Higher-Order Cell Tracking Transformer

Jordão Bragantini, Ilan Theodoro, Loic A. Royer

Jul 2026

LSR-Diff: A Diffusion Model Synthesizing Level Set Representations for Reliable Segmentation of Medical Images With Ambiguous Edges

Wenbo Gao, Haoyu Cao, J. Cheung, et al.

IEEE Transactions on Image Processing · Jul 2026

Medical SAM 2

#Key Features

#Technical Details

#Applications

#Impact

Citation

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Recent citations

UniMedSeg: Unified In-Context Learning for Multi-Paradigm 2D/3D Medical Image Segmentation

Higher-Order Cell Tracking Transformer

Top citations

MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Medical SAM 2

#Key Features

#Technical Details

#Applications

#Impact

Citation

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Recent citations

UniMedSeg: Unified In-Context Learning for Multi-Paradigm 2D/3D Medical Image Segmentation

Higher-Order Cell Tracking Transformer

Top citations

MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact