bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Language model foundation models
Language modelImagingPathology

BiomedGPT

Lehigh University / University of Georgia / Stanford University / Massachusetts General Hospital / University of Pennsylvania / University of Central Florida / UC Santa Cruz / UTHealth Houston / Mayo Clinic / Samsung Research America

Open-source, lightweight generalist vision-language foundation model for diverse biomedical imaging and text tasks.

Released: August 2024
Parameters: 182 Million

BiomedGPT is an open-source, lightweight vision-language foundation model designed to act as a generalist across a wide range of biomedical tasks. Rather than training a separate specialist network for each problem, BiomedGPT unifies medical image understanding and clinical text processing within a single encoder-decoder transformer, allowing one model to handle visual question answering, image captioning, image classification, text understanding, and summarization. It was developed by a multi-institutional team led by Lehigh University, with collaborators at the University of Georgia, Stanford University, Massachusetts General Hospital/Harvard Medical School, the University of Pennsylvania, the University of Central Florida, UC Santa Cruz, UTHealth Houston, Mayo Clinic, and Samsung Research America.

First released as a preprint in May 2023 (arXiv:2305.17100) and published in Nature Medicine in 2024, BiomedGPT addresses a central tension in medical AI: the most capable generalist systems, such as Med-PaLM M, are enormous, proprietary, and impractical for most institutions. BiomedGPT instead demonstrates that a compact, fully transparent model can reach state-of-the-art performance while remaining deployable on modest hardware. Its largest variant has 182 million parameters, roughly 3,000 times smaller than Med-PaLM M, lowering the barrier for under-resourced hospitals and academic labs.

The model exemplifies the generalist trend in biomedical AI, where a single foundation model is pretrained across many data modalities and then fine-tuned or evaluated across heterogeneous downstream tasks, contrasting with the long-standing paradigm of narrow, single-purpose medical models.

#Key Features

  • Generalist multimodal design: A single model spans radiology, pathology, and clinical text tasks, covering visual question answering, image captioning, classification, natural language inference, and summarization.
  • Lightweight and open: Three openly released variants (Small ~33M, Medium ~93M, Base ~182M parameters) make the model far cheaper to deploy than proprietary giants, with pretrained weights and code publicly available.
  • Strong benchmark performance: Achieves state-of-the-art results on 16 of 25 evaluated experiments despite its small scale.
  • Human-validated outputs: In expert assessments, it reached roughly a 3.8% error rate on visual question answering and 8.3% on radiology report generation, with summarization quality comparable to radiologists.
  • Unified pretraining objective: Combines masked image modeling, masked language modeling, object detection, image captioning, and image-text matching under one sequence-to-sequence framework.

#Technical Details

BiomedGPT adapts the OFA (One-For-All) sequence-to-sequence architecture, pairing a BERT-style encoder over corrupted inputs with a GPT-style left-to-right autoregressive decoder, so that images, text, and bounding boxes are all cast into a shared token sequence. It was pretrained on a diverse biomedical corpus comprising roughly 592,000 images, about 183 million text sentences, 271,000 image-text pairs, and 46,000 object-label pairs, drawn from sources including chest X-rays, pathology slides, clinical notes, and PubMed literature. The model is offered in Small (33M), Medium (93M), and Base (182M) configurations. Evaluation spanned 25 datasets across five task categories, including PathVQA, VQA-RAD, and SLAKE for visual question answering; IU X-ray, MIMIC-CXR, and PEIR Gross for captioning; MedMNIST and CBIS-DDSM for classification; and MedNLI and MIMIC-III tasks for text understanding and summarization.

#Applications

BiomedGPT is intended as a flexible backbone for clinical and research workflows that involve both medical imaging and text. Radiologists can use it to draft or summarize reports and answer questions about chest X-rays, pathologists can query histology images, and informatics teams can apply it to tasks such as clinical natural language inference and treatment-suggestion summarization. Because it is small and openly licensed for academic use, it is particularly attractive to under-resourced hospitals and academic groups that cannot run or pay for proprietary biomedical models.

#Impact

By showing that a 182M-parameter open model can rival far larger proprietary systems across many biomedical tasks, BiomedGPT became a widely cited reference point for efficient, transparent medical foundation models and helped popularize the generalist approach in clinical AI. Its public weights and code (with follow-on checkpoints scaling up to roughly 930M parameters) have made it a practical starting point for downstream research. Important limitations remain: the released weights inherit non-commercial restrictions from the OFA framework and are intended for academic research, evaluation focuses on benchmark and retrospective data rather than prospective clinical deployment, and like all medical AI it requires careful validation before any patient-facing use.

Citation

A generalist vision–language foundation model for diverse biomedical tasks

Zhang, K., et al. (2023) A generalist vision–language foundation model for diverse biomedical tasks. Nature Medicine.

DOI: 10.1038/s41591-024-03185-2

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations373
Influential21
References77

GitHub

Stars709
Forks81
Open Issues26
Contributors4
Last Push11mo ago
LanguagePython
LicenseApache-2.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
33Closed
Usability — can I run it?34
Reproducibility — can I retrain it?16
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

foundation_modelhistologyimage_captioningimage_classificationmulti_taskmultimodalradiologytext_summarizationtransformervisual_question_answering

Resources

GitHub RepositoryResearch PaperResearch PaperHuggingFace Model