Med-Gemini

Family of medical multimodal models built on Gemini, adding uncertainty-guided web search, custom modality encoders, and long-context EHR reasoning.

Released: April 2024

Med-Gemini is a family of medically specialized multimodal models introduced by Google Research and Google DeepMind in April 2024, described in the paper "Capabilities of Gemini Models in Medicine." It builds directly on the general Gemini foundation models, inheriting their multimodal and long-context reasoning and then fine-tuning and adapting them for the demands of clinical and biomedical tasks. The motivation is that excellence in medicine requires three things that general models struggle to combine: advanced reasoning, access to up-to-date medical knowledge, and the ability to interpret complex multimodal data such as images, electronic health records, and video.

Med-Gemini addresses these needs with two distinctive capabilities layered onto the Gemini base. First, it can seamlessly invoke web search through a novel uncertainty-guided search strategy, allowing the model to retrieve current information when its own answer is uncertain. Second, it can be efficiently tailored to novel modalities using custom encoders, rather than requiring a full retrain. The family spans variants built on different Gemini generations — a Gemini 1.0 Ultra-based model for text reasoning and a Gemini 1.5-based model for long-context and multimodal tasks.

The work is distinct from Google's earlier Med-PaLM and Med-PaLM M lineage, which were built on the PaLM and PaLM-E architectures. Med-Gemini represents the migration of Google's medical AI research onto the Gemini stack, and it set new state-of-the-art results across a broad swath of medical benchmarks at release.

Key Features

Uncertainty-guided web search: The model self-assesses confidence and issues web searches when uncertain, integrating retrieved results into its reasoning — the mechanism behind its 91.1% MedQA (USMLE) accuracy.
Built on Gemini foundation models: Med-Gemini inherits Gemini's native multimodal and long-context strengths, specializing them for medicine through fine-tuning rather than training from scratch.
Custom modality encoders: New data types can be incorporated efficiently via lightweight custom encoders, demonstrated on signals beyond standard image and text inputs.
Long-context EHR reasoning: Leverages Gemini 1.5's long context window to perform needle-in-a-haystack retrieval over lengthy de-identified health records using only in-context learning, surpassing prior bespoke methods.
Broad multimodal coverage: Handles medical imaging, NEJM Image Challenges, MMMU health and medicine questions, and medical video question answering within a single model family.

Technical Details

Med-Gemini is a family of decoder-style multimodal transformer models derived from Gemini 1.0 and Gemini 1.5, fine-tuned on medical data and augmented with search and custom-encoder capabilities. Evaluated on 14 medical benchmarks, it established new state-of-the-art performance on 10 of them and surpassed the GPT-4 model family on every benchmark with a viable direct comparison. On MedQA (USMLE) it reached 91.1% accuracy using the uncertainty-guided search strategy. Across 7 multimodal benchmarks — including NEJM Image Challenges and MMMU (health and medicine) — it improved over GPT-4V by an average relative margin of 44.5%. It also demonstrated state-of-the-art long-context retrieval on de-identified health records and medical video question answering through in-context learning, and surpassed human experts on tasks such as medical text summarization. Exact parameter counts are not disclosed, consistent with the proprietary Gemini base models.

Applications

Med-Gemini targets clinical and biomedical assistance use cases: answering medical questions, supporting diagnostic reasoning, interpreting medical images, summarizing clinical text, retrieving information from long patient records, and powering multimodal medical dialogue for research and education. Such capabilities could benefit clinicians, medical educators, and biomedical researchers. The models are research artifacts available through Google's restricted API and research programs rather than as open weights; no downloadable checkpoints are released. As the authors stress, further rigorous evaluation is crucial before any real-world deployment in this safety-critical domain.

Impact

Med-Gemini marked Google's transition of its medical AI agenda from the Med-PaLM (PaLM/PaLM-E) lineage onto the Gemini foundation, and at release it set new state-of-the-art results on the majority of evaluated medical benchmarks while outperforming the GPT-4 family on directly comparable tasks. Its uncertainty-guided search and long-context EHR results influenced subsequent work on retrieval-augmented and long-context clinical models, and it spawned companion papers extending the family into specialized imaging and genomics domains. The principal limitations are the absence of open weights, reliance on proprietary base models, and the authors' own caution that prospective clinical validation is required before deployment.

Citation

Capabilities of Gemini Models in Medicine

Preprint

Saab, K., et al. (2024) Capabilities of Gemini Models in Medicine. arXiv.org.

DOI: 10.48550/arXiv.2404.18416

Recent citations

Papers that recently cited this model.

Evidence-Grounded AI for Musculoskeletal Care
Wenjie Li, Yujie Zhang, Fanrui Zhang, et al.
Jul 2026
0
The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy
Chunzheng Zhu, Lei Tian, Bohan Tan, et al.
Jul 2026
0Influential
Performance of leading large language models in adhering to clinical guidelines for anaplastic thyroid cancer: a comparative study
Mohamed Yasser, Ghada Barakat, S. Awny, et al.
Scientific Reports · Jul 2026
0

Top citations

The most-cited papers that cite this model.

Toward expert-level medical question answering with large language models
Karan Singhal, Tao Tu, Juraj Gottweis, et al.
Nature Medicine · Jan 2025
924
HealthBench: Evaluating Large Language Models Towards Improved Human Health
Rahul K. Arora, Jason Wei, Rebecca Soskin Hicks, et al.
arXiv.org · May 2025
288Influential
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, et al.
Conference on Empirical Methods in Natural Language Processing · Jun 2024
259
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Junying Chen, Zhenyang Cai, Ke Ji, et al.
arXiv.org · Dec 2024
238
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Hongjian Zhou, Boyang Gu, Xinyu Zou, et al.
arXiv.org · Nov 2023
234Influential

Citations

Total Citations409

Influential31

References0

Fields of citing research

Computer Science95%
Medicine86%
Engineering7%
Linguistics4%
Biology3%
Education1%
Psychology1%
Mathematics1%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

8Closed

Usability — can I run it?7

Reproducibility — can I retrain it?5

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper Official Website

Key Features

Uncertainty-guided web search: The model self-assesses confidence and issues web searches when uncertain, integrating retrieved results into its reasoning — the mechanism behind its 91.1% MedQA (USMLE) accuracy.

Built on Gemini foundation models: Med-Gemini inherits Gemini's native multimodal and long-context strengths, specializing them for medicine through fine-tuning rather than training from scratch.

Custom modality encoders: New data types can be incorporated efficiently via lightweight custom encoders, demonstrated on signals beyond standard image and text inputs.

Long-context EHR reasoning: Leverages Gemini 1.5's long context window to perform needle-in-a-haystack retrieval over lengthy de-identified health records using only in-context learning, surpassing prior bespoke methods.

Broad multimodal coverage: Handles medical imaging, NEJM Image Challenges, MMMU health and medicine questions, and medical video question answering within a single model family.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Evidence-Grounded AI for Musculoskeletal Care

Wenjie Li, Yujie Zhang, Fanrui Zhang, et al.

Jul 2026

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Chunzheng Zhu, Lei Tian, Bohan Tan, et al.

Jul 2026

0Influential

Performance of leading large language models in adhering to clinical guidelines for anaplastic thyroid cancer: a comparative study

Mohamed Yasser, Ghada Barakat, S. Awny, et al.

Scientific Reports · Jul 2026

Med-Gemini

#Key Features

#Technical Details

#Applications

#Impact

Citation

Capabilities of Gemini Models in Medicine

Recent citations

Evidence-Grounded AI for Musculoskeletal Care

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Med-Gemini

#Key Features

#Technical Details

#Applications

#Impact

Citation

Capabilities of Gemini Models in Medicine

Recent citations

Evidence-Grounded AI for Musculoskeletal Care

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact