Google Research / Google DeepMind
Google's generalist multimodal biomedical AI that encodes clinical text, medical images, and genomics with a single set of weights across 14 tasks.
Med-PaLM M is a generalist multimodal biomedical AI system introduced by Google Research and Google DeepMind in July 2023, described in the paper "Towards Generalist Biomedical AI." Medicine is inherently multimodal — diagnosis and care draw on clinical notes, radiology and pathology images, genomic data, and more. Most prior medical AI systems are narrow specialists trained for a single task on a single modality. Med-PaLM M is a proof of concept that a single model, using the same set of weights, can flexibly encode and interpret biomedical data spanning clinical language, imaging, and genomics.
To evaluate such systems, the authors first curated MultiMedBench, a multimodal benchmark spanning 14 diverse tasks including medical question answering, mammography and dermatology image interpretation, chest X-ray report generation and summarization, pathology, and genomic variant calling. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, frequently surpassing specialist models trained for individual tasks.
Beyond raw benchmark numbers, the model demonstrates behaviors that motivate the generalist approach: zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. It sits alongside Med-PaLM (text-only medical question answering) in Google's medical foundation-model lineage, extending that work into the multimodal regime.
Med-PaLM M is built on PaLM-E, a multimodal architecture that combines the PaLM large language model with a Vision Transformer (ViT) image encoder, allowing text, images, and other modalities to be injected into a shared token sequence. The system is created by fine-tuning the complete set of model parameters on the MultiMedBench training tasks. The authors trained and evaluated three scales — 12B, 84B, and 562B parameters — enabling analysis of how multimodal medical capabilities scale with model size. Inputs are formatted into instruction-style task prompts with interleaved image tokens, and the model produces free-text outputs (answers, reports, classifications) for all tasks. The chest X-ray report generation results were assessed both with automated metrics and with a structured radiologist evaluation across model scales.
Med-PaLM M is a research system pointing toward unified clinical AI assistants that could draft radiology reports, answer medical questions, interpret dermatology and mammography images, summarize findings, and assist with genomic variant interpretation — all within one model. Such generalist systems could benefit radiologists, clinicians, and biomedical researchers by reducing the need to deploy and maintain many narrow models. Model weights are not openly released; Med-PaLM M is a research artifact, and Google has offered Med-PaLM capabilities only through restricted/API access rather than open download. As the authors emphasize, considerable validation is required before any real-world clinical use.
Med-PaLM M was an influential demonstration that a single large multimodal model can match or beat task-specific specialists across a broad span of biomedical modalities, helping catalyze interest in generalist biomedical AI. The accompanying MultiMedBench benchmark gave the community a standardized multimodal evaluation suite covering question answering, imaging, and genomics. The work is widely cited as a milestone in multimodal medical AI and helped shape Google's broader Med-PaLM and health-AI research agenda. Its main acknowledged limitations are the absence of open weights, the proof-of-concept nature of the evaluations, and the need for prospective clinical validation before deployment.
Tu, T., et al. (2023) Towards Generalist Biomedical AI. NEJM AI.
DOI: 10.48550/arXiv.2307.14334Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data