bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

MedSAM

Bowang Lab / University Health Network / University of Toronto / Vector Institute / Western University / New York University / Yale University

A promptable foundation model for universal medical image segmentation, fine-tuned from SAM on 1.57M image-mask pairs spanning 10 imaging modalities and 30+ cancer types.

Released: January 2024

MedSAM is a promptable foundation model for universal medical image segmentation, developed by Jun Ma, Bo Wang, and colleagues at the University Health Network, University of Toronto, the Vector Institute, and collaborating institutions, and published in Nature Communications in January 2024. It adapts Meta AI's Segment Anything Model (SAM) — a general-purpose natural-image segmentation model — to the medical domain, where SAM's zero-shot performance is unreliable because biomedical images differ sharply from web photographs in contrast, texture, and object boundaries.

Image segmentation is a foundational step in nearly every quantitative medical imaging workflow, from delineating tumors for radiotherapy planning to measuring organ volumes. Historically this required a separate specialist model trained per task and modality, each demanding large annotated datasets and brittle to distribution shift. MedSAM instead provides a single model that segments arbitrary structures across modalities when given a bounding-box prompt, collapsing dozens of task-specific pipelines into one interactive tool.

By training on the largest and most diverse medical segmentation corpus assembled at the time, MedSAM demonstrated that the prompt-driven, foundation-model paradigm transfers to medicine. It has become one of the most widely adopted reference points for promptable segmentation in biomedical imaging and seeded a family of follow-up work.

#Key Features

  • Bounding-box promptable segmentation: Users specify a target by drawing a box, and MedSAM returns a mask — enabling interactive, human-in-the-loop annotation rather than fully automated black-box output.
  • Modality-agnostic coverage: A single model handles 10 imaging modalities, including CT, MRI, endoscopy, ultrasound, pathology, fundus, dermoscopy, mammography, OCT, and chest X-ray.
  • Strong generalization: On external validation, MedSAM maintained consistent accuracy on unseen datasets where task-specific U-Net and DeepLabV3+ models degraded sharply.
  • Annotation acceleration: In a human study, MedSAM reduced expert tumor annotation time by roughly 82%, a direct practical benefit for building labeled datasets.
  • Open and accessible: Code (Apache-2.0) and pretrained ViT-B weights are publicly released, with CLI, Jupyter, and GUI inference paths plus a HuggingFace checkpoint.

#Technical Details

MedSAM retains SAM's three-part architecture: a ViT-Base image encoder, a prompt encoder, and a lightweight mask decoder. It was initialized from pretrained SAM weights; during fine-tuning the prompt encoder was frozen while the image encoder and mask decoder were updated, and only bounding-box prompts were used to keep the interface simple and clinically practical. Training used 1,570,263 image-mask pairs curated from publicly available sources, spanning 10 modalities and over 30 cancer types, on 20 A100 GPUs. Evaluation covered 86 internal and 60 external segmentation tasks. Across these, MedSAM substantially outperformed the original SAM (improvements ranging from roughly 15% to over 50% on hard tasks such as nasopharynx cancer) and was competitive with or superior to specialist segmentation networks while generalizing far better to out-of-distribution data.

#Applications

MedSAM serves radiologists, pathologists, and medical imaging researchers as a general-purpose segmentation backbone. Typical uses include accelerating manual annotation of tumors and organs, generating training labels for downstream specialist models, supporting radiotherapy and surgical planning, and providing a strong baseline for new segmentation benchmarks. Because it accepts simple box prompts, it integrates cleanly into interactive annotation tools and clinical research pipelines without per-task retraining.

#Impact

MedSAM was among the first works to demonstrate that the promptable foundation-model paradigm transfers effectively to medical imaging, and it has been heavily cited and adopted as a reference baseline across the field. Its public code and weights catalyzed a wave of follow-up models — including video- and 3D-oriented successors such as MedSAM2 — and helped establish interactive, prompt-driven segmentation as a practical standard for biomedical image analysis. Limitations remain: the model depends on a human-supplied prompt rather than fully automatic detection, performs best on well-bounded structures, and inherits coverage gaps from its training modalities, leaving fully automated and 3D/temporal segmentation as active areas of extension.

Citation

Segment anything in medical images

Ma, J., et al. (2023) Segment anything in medical images. Nature Communications.

DOI: 10.1038/s41467-024-44824-z

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations1.3K
Influential80
References54

GitHub

Stars4.3K
Forks587
Open Issues16
Contributors9
Last Push1y ago
LanguageJupyter Notebook
LicenseApache-2.0

HuggingFace

Downloads1.6K
Likes24
Last Modified3y ago
Pipelinemask-generation

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
82Open
Usability — can I run it?100
Reproducibility — can I retrain it?58
Model Openness Framework
Class III
Open Model

Tags

foundation_modelhistologymedical_image_segmentationradiologysegmentationvision_transformerzero_shot

Resources

GitHub RepositoryResearch PaperHuggingFace Model