bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingLanguage model

UniBiomed

Hong Kong University of Science and Technology / Weill Cornell Medicine / Harvard University

Universal foundation model that jointly generates diagnostic text and segments the corresponding targets across ten biomedical imaging modalities.

Released: April 2025

UniBiomed is a universal foundation model for grounded biomedical image interpretation: rather than producing a free-text finding or a segmentation mask in isolation, it generates a diagnostic description and simultaneously delineates the exact image regions that justify each finding. This pairing of textual reasoning with pixel-level evidence directly targets the interpretability gap that has limited clinical trust in medical-imaging AI, where a prediction is only actionable if a clinician can see what the model is looking at.

The model was developed by Linshan Wu, Hao Chen, and colleagues at The Hong Kong University of Science and Technology (HKUST), with collaborators at Weill Cornell Medicine and Harvard University, and first released as a preprint on arXiv in April 2025. It is positioned as the first model to unify grounded interpretation across the breadth of biomedical imaging, succeeding modality-specific or task-specific systems such as BiomedParse, LISA, and MedPLIB.

UniBiomed's central claim is generality. A single set of weights spans ten imaging modalities and five task families—segmentation, disease recognition, region-aware diagnosis, visual question answering, and report generation—removing the need for clinicians to pre-diagnose images or hand-craft textual and visual prompts before analysis.

#Key Features

  • Grounded interpretation: Couples diagnostic text generation with segmentation of the corresponding targets, so every finding is anchored to a visible image region rather than reported as an unverifiable label.
  • End-to-end and prompt-free: Provides automated interpretation without requiring expert pre-diagnosis or manually engineered text/box prompts, unlike prior promptable segmentation pipelines.
  • Ten-modality coverage: Operates across CT, MRI, OCT, PET, ultrasound, X-ray, pathology, endoscopy, fundus photography, and dermoscopy from a single model.
  • MLLM + SAM integration: Joins a multi-modal large language model with the Segment Anything Model so that language reasoning and mask prediction are trained jointly rather than bolted together.
  • Open release: Code (Apache-2.0), model weights, and the training dataset are published on GitHub and HuggingFace.

#Technical Details

UniBiomed combines a multi-modal large language model (MLLM) with the Segment Anything Model (SAM): the MLLM produces diagnostic text and emits grounding tokens that drive SAM to segment the referenced anatomy or lesion, allowing diverse tasks to be cast in a single universal training objective. Training used a curated corpus of over 27 million image–region–text triplets spanning the ten modalities. The authors validated the model on 84 datasets (70 internal and 14 external), reporting state-of-the-art results across all five task families. Reported gains include a 10.25% Dice improvement over BiomedParse on 60 segmentation datasets, +3.86% Dice and +3.29% accuracy over LISA on grounded disease recognition, +8.32% ROI-classification accuracy over MedPLIB, and region-aware report-generation scores of 52.4 BLEU-1, 30.4 METEOR, and 47.9 ROUGE-L.

#Applications

UniBiomed is aimed at clinical and research workflows where an interpretable, evidence-linked output matters more than a bare prediction. Radiologists and pathologists can use it to draft region-grounded reports, flag and outline suspicious findings, and answer visual questions about a study, while researchers benefit from a single backbone that generalizes across modalities instead of maintaining separate segmentation and reporting models. Because every textual finding is tied to a mask, downstream reviewers can quickly verify or contest the model's reasoning.

#Impact

By unifying grounded segmentation and diagnostic language across ten modalities in one openly released model, UniBiomed offers a template for interpretable, general-purpose medical-imaging AI and a strong baseline for grounded interpretation that subsequent multimodal clinical models can build on. As a preprint with released weights and data, its real-world clinical value still awaits prospective validation, and—like other large multimodal medical models—its outputs require expert oversight before any diagnostic use.

Citation

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

Preprint

Wu, L., et al. (2025) UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation. arXiv.org.

DOI: 10.48550/arXiv.2504.21336

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations9
Influential2
References136

GitHub

Stars60
Forks10
Open Issues4
Contributors1
Last Push3mo ago
LanguagePython
LicenseApache-2.0

HuggingFace

Downloads217
Likes8
Last Modified1y ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
64Partial
Usability — can I run it?87
Reproducibility — can I retrain it?49
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

disease_recognitionfoundation_modelhistologymultimodalradiologyreport_generationsegmentationtransformervision_transformervisual_question_answering

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDataset