Google Research / Google DeepMind
Google's family of medical multimodal models built on Gemini, adding uncertainty-guided web search, custom modality encoders, and long-context EHR reasoning.
Med-Gemini is a family of medically specialized multimodal models introduced by Google Research and Google DeepMind in April 2024, described in the paper "Capabilities of Gemini Models in Medicine." It builds directly on the general Gemini foundation models, inheriting their multimodal and long-context reasoning and then fine-tuning and adapting them for the demands of clinical and biomedical tasks. The motivation is that excellence in medicine requires three things that general models struggle to combine: advanced reasoning, access to up-to-date medical knowledge, and the ability to interpret complex multimodal data such as images, electronic health records, and video.
Med-Gemini addresses these needs with two distinctive capabilities layered onto the Gemini base. First, it can seamlessly invoke web search through a novel uncertainty-guided search strategy, allowing the model to retrieve current information when its own answer is uncertain. Second, it can be efficiently tailored to novel modalities using custom encoders, rather than requiring a full retrain. The family spans variants built on different Gemini generations — a Gemini 1.0 Ultra-based model for text reasoning and a Gemini 1.5-based model for long-context and multimodal tasks.
The work is distinct from Google's earlier Med-PaLM and Med-PaLM M lineage, which were built on the PaLM and PaLM-E architectures. Med-Gemini represents the migration of Google's medical AI research onto the Gemini stack, and it set new state-of-the-art results across a broad swath of medical benchmarks at release.
Med-Gemini is a family of decoder-style multimodal transformer models derived from Gemini 1.0 and Gemini 1.5, fine-tuned on medical data and augmented with search and custom-encoder capabilities. Evaluated on 14 medical benchmarks, it established new state-of-the-art performance on 10 of them and surpassed the GPT-4 model family on every benchmark with a viable direct comparison. On MedQA (USMLE) it reached 91.1% accuracy using the uncertainty-guided search strategy. Across 7 multimodal benchmarks — including NEJM Image Challenges and MMMU (health and medicine) — it improved over GPT-4V by an average relative margin of 44.5%. It also demonstrated state-of-the-art long-context retrieval on de-identified health records and medical video question answering through in-context learning, and surpassed human experts on tasks such as medical text summarization. Exact parameter counts are not disclosed, consistent with the proprietary Gemini base models.
Med-Gemini targets clinical and biomedical assistance use cases: answering medical questions, supporting diagnostic reasoning, interpreting medical images, summarizing clinical text, retrieving information from long patient records, and powering multimodal medical dialogue for research and education. Such capabilities could benefit clinicians, medical educators, and biomedical researchers. The models are research artifacts available through Google's restricted API and research programs rather than as open weights; no downloadable checkpoints are released. As the authors stress, further rigorous evaluation is crucial before any real-world deployment in this safety-critical domain.
Med-Gemini marked Google's transition of its medical AI agenda from the Med-PaLM (PaLM/PaLM-E) lineage onto the Gemini foundation, and at release it set new state-of-the-art results on the majority of evaluated medical benchmarks while outperforming the GPT-4 family on directly comparable tasks. Its uncertainty-guided search and long-context EHR results influenced subsequent work on retrieval-augmented and long-context clinical models, and it spawned companion papers extending the family into specialized imaging and genomics domains. The principal limitations are the absence of open weights, reliance on proprietary base models, and the authors' own caution that prospective clinical validation is required before deployment.
Saab, K., et al. (2024) Capabilities of Gemini Models in Medicine. arXiv.org.
DOI: 10.48550/arXiv.2404.18416Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data