Med-PaLM M

Google's generalist multimodal biomedical AI that encodes clinical text, medical images, and genomics with a single set of weights across 14 tasks.

Released: July 2023

Med-PaLM M is a generalist multimodal biomedical AI system introduced by Google Research and Google DeepMind in July 2023, described in the paper "Towards Generalist Biomedical AI." Medicine is inherently multimodal — diagnosis and care draw on clinical notes, radiology and pathology images, genomic data, and more. Most prior medical AI systems are narrow specialists trained for a single task on a single modality. Med-PaLM M is a proof of concept that a single model, using the same set of weights, can flexibly encode and interpret biomedical data spanning clinical language, imaging, and genomics.

To evaluate such systems, the authors first curated MultiMedBench, a multimodal benchmark spanning 14 diverse tasks including medical question answering, mammography and dermatology image interpretation, chest X-ray report generation and summarization, pathology, and genomic variant calling. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, frequently surpassing specialist models trained for individual tasks.

Beyond raw benchmark numbers, the model demonstrates behaviors that motivate the generalist approach: zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. It sits alongside Med-PaLM (text-only medical question answering) in Google's medical foundation-model lineage, extending that work into the multimodal regime.

Key Features

Single set of weights across modalities: One model encodes and interprets clinical text, medical images, and genomics, rather than relying on a collection of task-specific specialist models.
MultiMedBench coverage: Evaluated on 14 tasks across multiple modalities, reaching state-of-the-art or competitive performance on every task and often surpassing specialists by a wide margin.
Emergent zero-shot behaviors: Generalizes to novel medical concepts and tasks it was not explicitly trained on, with evidence of emergent medical reasoning.
Positive cross-task transfer: Training jointly across diverse tasks improves performance, demonstrating beneficial transfer rather than interference.
Clinically evaluated report generation: In a radiologist side-by-side ranking on 246 retrospective chest X-rays, clinicians preferred Med-PaLM M reports over radiologist-written ones in up to 40.50% of cases.

Technical Details

Med-PaLM M is built on PaLM-E, a multimodal architecture that combines the PaLM large language model with a Vision Transformer (ViT) image encoder, allowing text, images, and other modalities to be injected into a shared token sequence. The system is created by fine-tuning the complete set of model parameters on the MultiMedBench training tasks. The authors trained and evaluated three scales — 12B, 84B, and 562B parameters — enabling analysis of how multimodal medical capabilities scale with model size. Inputs are formatted into instruction-style task prompts with interleaved image tokens, and the model produces free-text outputs (answers, reports, classifications) for all tasks. The chest X-ray report generation results were assessed both with automated metrics and with a structured radiologist evaluation across model scales.

Applications

Med-PaLM M is a research system pointing toward unified clinical AI assistants that could draft radiology reports, answer medical questions, interpret dermatology and mammography images, summarize findings, and assist with genomic variant interpretation — all within one model. Such generalist systems could benefit radiologists, clinicians, and biomedical researchers by reducing the need to deploy and maintain many narrow models. Model weights are not openly released; Med-PaLM M is a research artifact, and Google has offered Med-PaLM capabilities only through restricted/API access rather than open download. As the authors emphasize, considerable validation is required before any real-world clinical use.

Impact

Med-PaLM M was an influential demonstration that a single large multimodal model can match or beat task-specific specialists across a broad span of biomedical modalities, helping catalyze interest in generalist biomedical AI. The accompanying MultiMedBench benchmark gave the community a standardized multimodal evaluation suite covering question answering, imaging, and genomics. The work is widely cited as a milestone in multimodal medical AI and helped shape Google's broader Med-PaLM and health-AI research agenda. Its main acknowledged limitations are the absence of open weights, the proof-of-concept nature of the evaluations, and the need for prospective clinical validation before deployment.

Citation

Towards Generalist Biomedical AI

Preprint

Tu, T., et al. (2023) Towards Generalist Biomedical AI. NEJM AI.

DOI: 10.48550/arXiv.2307.14334

Recent citations

Papers that recently cited this model.

Foundations, vulnerabilities, and GenAI-driven defenses in Cognitive Internet of Medical Things systems
Lamia Chaari Fourati
Journal of systems architecture · Sep 2026
0
Voice, Speech, and Large Language Models in Neurology: From Acoustic Biomarkers to Conversational AI
S. Shelly
Computation · Jul 2026
0
Towards Enhancing 3D Spatial Reasoning in Medical Multimodal Large Language Models
Zhuoyuan Fu, Zeshang Li, Yiqiong Zhang, et al.
Jul 2026
0

Top citations

The most-cited papers that cite this model.

A whole-slide foundation model for digital pathology from real-world data
Hanwen Xu, N. Usuyama, Jaspreet Bagga, et al.
Nature · May 2024
842
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Zesen Cheng, Sicong Leng, Hang Zhang, et al.
arXiv.org · Jun 2024
760
Adapted large language models can outperform medical experts in clinical text summarization
Dave Van Veen, Cara Van Uden, L. Blankemeier, et al.
Nature Medicine · Sep 2023
742
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Sheng Zhang, Yanbo Xu, N. Usuyama, et al.
Mar 2023
599
Evaluation and mitigation of the limitations of large language models in clinical decision-making
P. Hager, F. Jungmann, R. Holland, et al.
Nature Medicine · Jul 2024
575

Citations

Total Citations552

Influential31

References130

Fields of citing research

Computer Science84%
Medicine78%
Engineering13%
Linguistics3%
Biology3%
Psychology1%
Environmental Science1%
Agricultural and Food Sciences1%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

25Closed

Usability — can I run it?18

Reproducibility — can I retrain it?16

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper Official Website

Key Features

Single set of weights across modalities: One model encodes and interprets clinical text, medical images, and genomics, rather than relying on a collection of task-specific specialist models.

MultiMedBench coverage: Evaluated on 14 tasks across multiple modalities, reaching state-of-the-art or competitive performance on every task and often surpassing specialists by a wide margin.

Emergent zero-shot behaviors: Generalizes to novel medical concepts and tasks it was not explicitly trained on, with evidence of emergent medical reasoning.

Positive cross-task transfer: Training jointly across diverse tasks improves performance, demonstrating beneficial transfer rather than interference.

Clinically evaluated report generation: In a radiologist side-by-side ranking on 246 retrospective chest X-rays, clinicians preferred Med-PaLM M reports over radiologist-written ones in up to 40.50% of cases.

Technical Details

Applications

Impact

Med-PaLM M

#Key Features

#Technical Details

#Applications

#Impact

Citation

Towards Generalist Biomedical AI

Recent citations

Towards Enhancing 3D Spatial Reasoning in Medical Multimodal Large Language Models

Top citations

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Med-PaLM M

#Key Features

#Technical Details

#Applications

#Impact

Citation

Towards Generalist Biomedical AI

Recent citations

Towards Enhancing 3D Spatial Reasoning in Medical Multimodal Large Language Models

Top citations

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact