bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

Medical SAM 2

University of Oxford / National University of Singapore

SAM2-based foundation model that segments 2D and 3D medical images by treating volumes and image sets as video object tracking.

Released: August 2024

Medical SAM 2 (MedSAM-2) is a promptable segmentation foundation model that adapts Meta's Segment Anything Model 2 (SAM 2) to medical imaging. Its central idea is to reframe both 2D and 3D medical segmentation as a video object tracking problem: a 3D volume is processed as a sequence of frames, and even an unordered collection of unrelated 2D images can be treated as a pseudo-video. This unifies what are usually separate 2D and 3D segmentation pipelines under a single architecture and inference paradigm.

The model was developed by Jiayuan Zhu, Abdullah Hamdi, Yunli Qi, Yueming Jin, and Junde Wu at the University of Oxford and the National University of Singapore, with the work first posted to arXiv in August 2024. It addresses a persistent limitation of SAM-style models in medicine: out-of-the-box SAM and SAM 2 perform inconsistently on medical modalities such as CT, MRI, ultrasound, and fundus imaging, and propagating a single prompt across an entire 3D scan or image set is non-trivial.

MedSAM-2 sits alongside other SAM derivatives for medicine (such as MedSAM v1, SAM-Med2D, and the later bowang-lab MedSAM2) but is distinguished by its memory-driven, tracking-based formulation and its "One-Prompt Segmentation" capability, which lets a single annotated example drive segmentation of many subsequent images.

#Key Features

  • Video-style unified segmentation: Treats 2D image sets and 3D volumes alike as frame sequences, applying SAM 2's video tracking pipeline so the same model handles both task types without architecture changes.
  • Self-sorting memory bank: A novel memory mechanism dynamically selects stored embeddings based on confidence and feature dissimilarity rather than temporal order, improving robustness when propagating segmentations through a volume.
  • One-Prompt Segmentation: A single user prompt on one image can be propagated to segment many additional images that share no temporal relationship, reducing per-image annotation burden.
  • Modality-agnostic prompting: Supports interactive bounding-box and point prompts across CT, MRI, ultrasound, fundus, microscopy, and other modalities.
  • Open weights and code: Released under Apache-2.0 with pretrained checkpoints distributed via Hugging Face.

#Technical Details

MedSAM-2 builds directly on the SAM 2 backbone, which couples a Hiera image encoder with a memory attention module and a streaming memory bank. The authors fine-tune this pipeline for medical data and replace the default temporal memory with their self-sorting memory bank, which curates the most informative embeddings to condition predictions on later frames. Released checkpoints are fine-tuned with public datasets including REFUGE (optic cup, 2D fundus) and BTCV (abdominal multi-organ, 3D CT). Evaluation spans roughly 25 segmentation tasks across more than a dozen benchmarks, covering abdominal organs, kidney and liver tumors, breast and nasopharynx cancer, vestibular schwannoma, mediastinal lymph nodes, cerebral and coronary arteries, white blood cells, retinal vessels, and mandibles. Across these 2D and 3D tasks the paper reports state-of-the-art or competitive Dice scores relative to SAM, SAM 2, MedSAM, and specialist segmentation networks.

#Applications

MedSAM-2 targets researchers and clinicians who need to delineate anatomical structures, lesions, and tumors across diverse imaging modalities without training a bespoke model for each task. Its tracking-based design is well suited to volumetric annotation in radiology, where a clinician can prompt a single slice and propagate the contour through an entire CT or MRI scan, and its One-Prompt mode supports efficient batch labeling of large 2D image collections in pathology, ophthalmology, and microscopy. The model is most useful as an interactive annotation accelerator and a strong baseline for medical segmentation pipelines.

#Impact

By recasting medical segmentation as video tracking and adding a memory bank tailored to non-temporal medical data, MedSAM-2 demonstrated that SAM 2 can be adapted into a general-purpose, promptable tool for both 2D and 3D imaging. Released openly with code and weights, it became a widely cited reference point in the rapidly growing family of SAM-based medical segmentation models and helped popularize the volume-as-video framing. As a preprint built on a fast-moving foundation, its reported benchmarks should be read as a snapshot, and like other interactive SAM derivatives it still depends on user prompts and can struggle on modalities far from its fine-tuning data.

Citation

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Preprint

Zhu, J., et al. (2024) Medical SAM 2: Segment medical images as video via Segment Anything Model 2. arXiv.org.

DOI: 10.48550/arXiv.2408.00874

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations229
Influential13
References52

GitHub

Stars922
Forks138
Open Issues50
Contributors5
Last Push1y ago
LanguagePython
LicenseApache-2.0

HuggingFace

Downloads0
Likes6
Last Modified1y ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
74Open
Usability — can I run it?93
Reproducibility — can I retrain it?50
Model Openness Framework
Class III
Open Model

Tags

foundation_modelhistologyinteractive_segmentationradiologysegmentationtransformervision_transformerzero_shot

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace Model