UniBiomed

Hong Kong University of Science and Technology / Weill Cornell Medicine / Harvard University

Universal foundation model that jointly generates diagnostic text and segments the corresponding targets across ten biomedical imaging modalities.

Released: April 2025

UniBiomed is a universal foundation model for grounded biomedical image interpretation: rather than producing a free-text finding or a segmentation mask in isolation, it generates a diagnostic description and simultaneously delineates the exact image regions that justify each finding. This pairing of textual reasoning with pixel-level evidence directly targets the interpretability gap that has limited clinical trust in medical-imaging AI, where a prediction is only actionable if a clinician can see what the model is looking at.

The model was developed by Linshan Wu, Hao Chen, and colleagues at The Hong Kong University of Science and Technology (HKUST), with collaborators at Weill Cornell Medicine and Harvard University, and first released as a preprint on arXiv in April 2025. It is positioned as the first model to unify grounded interpretation across the breadth of biomedical imaging, succeeding modality-specific or task-specific systems such as BiomedParse, LISA, and MedPLIB.

UniBiomed's central claim is generality. A single set of weights spans ten imaging modalities and five task families—segmentation, disease recognition, region-aware diagnosis, visual question answering, and report generation—removing the need for clinicians to pre-diagnose images or hand-craft textual and visual prompts before analysis.

Key Features

Grounded interpretation: Couples diagnostic text generation with segmentation of the corresponding targets, so every finding is anchored to a visible image region rather than reported as an unverifiable label.
End-to-end and prompt-free: Provides automated interpretation without requiring expert pre-diagnosis or manually engineered text/box prompts, unlike prior promptable segmentation pipelines.
Ten-modality coverage: Operates across CT, MRI, OCT, PET, ultrasound, X-ray, pathology, endoscopy, fundus photography, and dermoscopy from a single model.
MLLM + SAM integration: Joins a multi-modal large language model with the Segment Anything Model so that language reasoning and mask prediction are trained jointly rather than bolted together.
Open release: Code (Apache-2.0), model weights, and the training dataset are published on GitHub and HuggingFace.

Technical Details

UniBiomed combines a multi-modal large language model (MLLM) with the Segment Anything Model (SAM): the MLLM produces diagnostic text and emits grounding tokens that drive SAM to segment the referenced anatomy or lesion, allowing diverse tasks to be cast in a single universal training objective. Training used a curated corpus of over 27 million image–region–text triplets spanning the ten modalities. The authors validated the model on 84 datasets (70 internal and 14 external), reporting state-of-the-art results across all five task families. Reported gains include a 10.25% Dice improvement over BiomedParse on 60 segmentation datasets, +3.86% Dice and +3.29% accuracy over LISA on grounded disease recognition, +8.32% ROI-classification accuracy over MedPLIB, and region-aware report-generation scores of 52.4 BLEU-1, 30.4 METEOR, and 47.9 ROUGE-L.

Applications

UniBiomed is aimed at clinical and research workflows where an interpretable, evidence-linked output matters more than a bare prediction. Radiologists and pathologists can use it to draft region-grounded reports, flag and outline suspicious findings, and answer visual questions about a study, while researchers benefit from a single backbone that generalizes across modalities instead of maintaining separate segmentation and reporting models. Because every textual finding is tied to a mask, downstream reviewers can quickly verify or contest the model's reasoning.

Impact

By unifying grounded segmentation and diagnostic language across ten modalities in one openly released model, UniBiomed offers a template for interpretable, general-purpose medical-imaging AI and a strong baseline for grounded interpretation that subsequent multimodal clinical models can build on. As a preprint with released weights and data, its real-world clinical value still awaits prospective validation, and—like other large multimodal medical models—its outputs require expert oversight before any diagnostic use.

Citation

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

Preprint

Wu, L., et al. (2025) UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation. arXiv.org.

DOI: 10.48550/arXiv.2504.21336

Recent citations

Papers that recently cited this model.

From Integrated Care to Learning Systems
Aristeidis Tsitiridis, Konstantinos Perakis, A. Antoniades, et al.
Healthcare · Jun 2026
0
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework
Yuexi Du, Jinglu Wang, Shujie Liu, et al.
arXiv.org · Mar 2026
2
MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning
Shengyuan Liu, Liuxin Bao, Qi Yang, et al.
arXiv.org · Feb 2026
2

Top citations

The most-cited papers that cite this model.

Large-Scale 3D Medical Image Pre-Training With Geometric Context Priors
Linshan Wu, Jiaxin Zhuang, Hao Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence · Oct 2024
37
Foundation Models in Biomedical Imaging: Turning Hype into Reality
A. Muneer, Kai Zhang, I. Hamdi, et al.
arXiv.org · Dec 2025
5
Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning
Guoxin Wang, Jun Zhao, Xinyi Liu, et al.
arXiv.org · Sep 2025
4Influential
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework
Yuexi Du, Jinglu Wang, Shujie Liu, et al.
arXiv.org · Mar 2026
2
MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning
Shengyuan Liu, Liuxin Bao, Qi Yang, et al.
arXiv.org · Feb 2026
2

Citations

Total Citations10

Influential2

References136

GitHub

Stars71

Forks12

Open Issues6

Contributors1

Last Push1mo ago

LanguagePython

LicenseApache-2.0

HuggingFace

Downloads109

Likes9

Last Modified1y ago

Fields of citing research

Computer Science100%
Medicine100%
Biology10%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

64Partial

Usability — can I run it?87

Reproducibility — can I retrain it?49

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model Dataset

Key Features

Grounded interpretation: Couples diagnostic text generation with segmentation of the corresponding targets, so every finding is anchored to a visible image region rather than reported as an unverifiable label.

End-to-end and prompt-free: Provides automated interpretation without requiring expert pre-diagnosis or manually engineered text/box prompts, unlike prior promptable segmentation pipelines.

Ten-modality coverage: Operates across CT, MRI, OCT, PET, ultrasound, X-ray, pathology, endoscopy, fundus photography, and dermoscopy from a single model.

MLLM + SAM integration: Joins a multi-modal large language model with the Segment Anything Model so that language reasoning and mask prediction are trained jointly rather than bolted together.

Open release: Code (Apache-2.0), model weights, and the training dataset are published on GitHub and HuggingFace.

Technical Details

Applications

Impact

UniBiomed

#Key Features

#Technical Details

#Applications

#Impact

Citation

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

UniBiomed

#Key Features

#Technical Details

#Applications

#Impact

Citation

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact