bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingLanguage model

Med-PaLM M

Google Research / Google DeepMind

Google's generalist multimodal biomedical AI that encodes clinical text, medical images, and genomics with a single set of weights across 14 tasks.

Released: July 2023

Med-PaLM M is a generalist multimodal biomedical AI system introduced by Google Research and Google DeepMind in July 2023, described in the paper "Towards Generalist Biomedical AI." Medicine is inherently multimodal — diagnosis and care draw on clinical notes, radiology and pathology images, genomic data, and more. Most prior medical AI systems are narrow specialists trained for a single task on a single modality. Med-PaLM M is a proof of concept that a single model, using the same set of weights, can flexibly encode and interpret biomedical data spanning clinical language, imaging, and genomics.

To evaluate such systems, the authors first curated MultiMedBench, a multimodal benchmark spanning 14 diverse tasks including medical question answering, mammography and dermatology image interpretation, chest X-ray report generation and summarization, pathology, and genomic variant calling. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, frequently surpassing specialist models trained for individual tasks.

Beyond raw benchmark numbers, the model demonstrates behaviors that motivate the generalist approach: zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. It sits alongside Med-PaLM (text-only medical question answering) in Google's medical foundation-model lineage, extending that work into the multimodal regime.

#Key Features

  • Single set of weights across modalities: One model encodes and interprets clinical text, medical images, and genomics, rather than relying on a collection of task-specific specialist models.
  • MultiMedBench coverage: Evaluated on 14 tasks across multiple modalities, reaching state-of-the-art or competitive performance on every task and often surpassing specialists by a wide margin.
  • Emergent zero-shot behaviors: Generalizes to novel medical concepts and tasks it was not explicitly trained on, with evidence of emergent medical reasoning.
  • Positive cross-task transfer: Training jointly across diverse tasks improves performance, demonstrating beneficial transfer rather than interference.
  • Clinically evaluated report generation: In a radiologist side-by-side ranking on 246 retrospective chest X-rays, clinicians preferred Med-PaLM M reports over radiologist-written ones in up to 40.50% of cases.

#Technical Details

Med-PaLM M is built on PaLM-E, a multimodal architecture that combines the PaLM large language model with a Vision Transformer (ViT) image encoder, allowing text, images, and other modalities to be injected into a shared token sequence. The system is created by fine-tuning the complete set of model parameters on the MultiMedBench training tasks. The authors trained and evaluated three scales — 12B, 84B, and 562B parameters — enabling analysis of how multimodal medical capabilities scale with model size. Inputs are formatted into instruction-style task prompts with interleaved image tokens, and the model produces free-text outputs (answers, reports, classifications) for all tasks. The chest X-ray report generation results were assessed both with automated metrics and with a structured radiologist evaluation across model scales.

#Applications

Med-PaLM M is a research system pointing toward unified clinical AI assistants that could draft radiology reports, answer medical questions, interpret dermatology and mammography images, summarize findings, and assist with genomic variant interpretation — all within one model. Such generalist systems could benefit radiologists, clinicians, and biomedical researchers by reducing the need to deploy and maintain many narrow models. Model weights are not openly released; Med-PaLM M is a research artifact, and Google has offered Med-PaLM capabilities only through restricted/API access rather than open download. As the authors emphasize, considerable validation is required before any real-world clinical use.

#Impact

Med-PaLM M was an influential demonstration that a single large multimodal model can match or beat task-specific specialists across a broad span of biomedical modalities, helping catalyze interest in generalist biomedical AI. The accompanying MultiMedBench benchmark gave the community a standardized multimodal evaluation suite covering question answering, imaging, and genomics. The work is widely cited as a milestone in multimodal medical AI and helped shape Google's broader Med-PaLM and health-AI research agenda. Its main acknowledged limitations are the absence of open weights, the proof-of-concept nature of the evaluations, and the need for prospective clinical validation before deployment.

Citation

Towards Generalist Biomedical AI

Preprint

Tu, T., et al. (2023) Towards Generalist Biomedical AI. NEJM AI.

DOI: 10.48550/arXiv.2307.14334

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations506
Influential29
References130

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
25Closed
Usability — can I run it?18
Reproducibility — can I retrain it?16
Model Openness Framework
Unclassified
Missing required components

Tags

generativegenomicshistologyimage_interpretationmedical_question_answeringmultimodalradiologyradiology_report_generationtransformervariant_callingvision_transformerzero_shot

Resources

Research PaperOfficial Website