Lehigh University / University of Georgia / Stanford University / Massachusetts General Hospital / University of Pennsylvania / University of Central Florida / UC Santa Cruz / UTHealth Houston / Mayo Clinic / Samsung Research America
Open-source, lightweight generalist vision-language foundation model for diverse biomedical imaging and text tasks.
BiomedGPT is an open-source, lightweight vision-language foundation model designed to act as a generalist across a wide range of biomedical tasks. Rather than training a separate specialist network for each problem, BiomedGPT unifies medical image understanding and clinical text processing within a single encoder-decoder transformer, allowing one model to handle visual question answering, image captioning, image classification, text understanding, and summarization. It was developed by a multi-institutional team led by Lehigh University, with collaborators at the University of Georgia, Stanford University, Massachusetts General Hospital/Harvard Medical School, the University of Pennsylvania, the University of Central Florida, UC Santa Cruz, UTHealth Houston, Mayo Clinic, and Samsung Research America.
First released as a preprint in May 2023 (arXiv:2305.17100) and published in Nature Medicine in 2024, BiomedGPT addresses a central tension in medical AI: the most capable generalist systems, such as Med-PaLM M, are enormous, proprietary, and impractical for most institutions. BiomedGPT instead demonstrates that a compact, fully transparent model can reach state-of-the-art performance while remaining deployable on modest hardware. Its largest variant has 182 million parameters, roughly 3,000 times smaller than Med-PaLM M, lowering the barrier for under-resourced hospitals and academic labs.
The model exemplifies the generalist trend in biomedical AI, where a single foundation model is pretrained across many data modalities and then fine-tuned or evaluated across heterogeneous downstream tasks, contrasting with the long-standing paradigm of narrow, single-purpose medical models.
BiomedGPT adapts the OFA (One-For-All) sequence-to-sequence architecture, pairing a BERT-style encoder over corrupted inputs with a GPT-style left-to-right autoregressive decoder, so that images, text, and bounding boxes are all cast into a shared token sequence. It was pretrained on a diverse biomedical corpus comprising roughly 592,000 images, about 183 million text sentences, 271,000 image-text pairs, and 46,000 object-label pairs, drawn from sources including chest X-rays, pathology slides, clinical notes, and PubMed literature. The model is offered in Small (33M), Medium (93M), and Base (182M) configurations. Evaluation spanned 25 datasets across five task categories, including PathVQA, VQA-RAD, and SLAKE for visual question answering; IU X-ray, MIMIC-CXR, and PEIR Gross for captioning; MedMNIST and CBIS-DDSM for classification; and MedNLI and MIMIC-III tasks for text understanding and summarization.
BiomedGPT is intended as a flexible backbone for clinical and research workflows that involve both medical imaging and text. Radiologists can use it to draft or summarize reports and answer questions about chest X-rays, pathologists can query histology images, and informatics teams can apply it to tasks such as clinical natural language inference and treatment-suggestion summarization. Because it is small and openly licensed for academic use, it is particularly attractive to under-resourced hospitals and academic groups that cannot run or pay for proprietary biomedical models.
By showing that a 182M-parameter open model can rival far larger proprietary systems across many biomedical tasks, BiomedGPT became a widely cited reference point for efficient, transparent medical foundation models and helped popularize the generalist approach in clinical AI. Its public weights and code (with follow-on checkpoints scaling up to roughly 930M parameters) have made it a practical starting point for downstream research. Important limitations remain: the released weights inherit non-commercial restrictions from the OFA framework and are intended for academic research, evaluation focuses on benchmark and retrospective data rather than prospective clinical deployment, and like all medical AI it requires careful validation before any patient-facing use.
Zhang, K., et al. (2023) A generalist vision–language foundation model for diverse biomedical tasks. Nature Medicine.
DOI: 10.1038/s41591-024-03185-2Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data