GMAI-VL-R1

Shanghai AI Laboratory / Fuzhou University / Shanghai Innovation Institute / Fudan University / Monash University / University of Washington / Stanford University

General medical vision-language model trained with reinforcement learning to reason step by step over medical images for diagnosis and visual QA.

Released: April 2025

Parameters: 7 Billion

GMAI-VL-R1 is a multimodal medical reasoning model that augments a general medical vision-language model with explicit, step-by-step reasoning learned through reinforcement learning (RL). It addresses a recurring weakness of existing general medical AI systems: while they can describe a medical image or answer a direct question, they often lack the structured reasoning needed for complex clinical decision-making, where intermediate inference steps matter as much as the final answer.

The model was introduced in April 2025 by researchers from Shanghai Artificial Intelligence Laboratory (the "uni-medical" group), Fuzhou University, Shanghai Innovation Institute, Fudan University, Monash University, the University of Washington, and Stanford University. It belongs to the broader GMAI-VL family of general medical AI vision-language models but distinguishes itself by being trained with verifiable-reward RL rather than supervised fine-tuning alone. The authors are among the first to apply Group Relative Policy Optimization (GRPO) to the multimodal medical domain at scale.

GMAI-VL-R1 fits into a fast-growing line of "reasoning-enhanced" medical multimodal models (alongside efforts such as MedVLM-R1), where RL on verifiable medical questions is used to elicit chain-of-thought style reasoning that generalizes to unseen tasks better than memorization-driven supervised training.

Key Features

RL-driven medical reasoning: Uses Group Relative Policy Optimization (GRPO) with verifiable rewards to teach the model to reason through multiple-choice medical questions, rather than relying solely on supervised demonstrations.
Reasoning data synthesis: A rejection-sampling pipeline generates high-quality step-by-step reasoning traces, packaged as the GMAI-Reasoning10K dataset of 10,000 curated multiple-choice questions spanning X-ray, CT, MRI, OCT, and ultrasound.
Stronger out-of-distribution generalization: RL training improves performance on held-out and cross-domain benchmarks where supervised fine-tuning tends to overfit to in-distribution patterns.
Multiple model scales: Released in both 3B and 7B variants (built on Qwen2.5-VL), allowing deployment trade-offs between accuracy and compute.
Open release: Code, the reasoning dataset, and model weights are released publicly through the official repository.

Technical Details

GMAI-VL-R1 is built on the Qwen2.5-VL vision-language backbone, with a primary 7B-parameter model and a smaller 3B variant. Training applies GRPO, a policy-gradient RL method that computes advantages over groups of sampled responses with KL-divergence regularization against a reference policy, using correctness on multiple-choice medical questions as the verifiable reward signal. The GMAI-Reasoning10K training set aggregates roughly 10,000 questions distilled from 95 public medical datasets across five imaging modalities. On evaluation, the 7B RL-tuned model improves over its supervised fine-tuning baseline on several benchmarks: GMAI-MMBench (val) rises to 43.14% and the validation split improves by about 3 points, with comparable gains on MMMU, MMMU-pro, and MedXpertQA-MM. The authors report that RL training generalizes better on out-of-distribution tasks, while supervised fine-tuning retains an edge on some in-distribution benchmarks such as OmniMedVQA.

Applications

GMAI-VL-R1 targets medical image interpretation and visual question answering across radiology (X-ray, CT, MRI), ophthalmology (OCT), and ultrasound, with use cases in diagnostic support, clinical decision assistance, and medical education. By producing explicit reasoning chains rather than bare answers, it is better suited to settings where clinicians need to inspect and verify the rationale behind a model's output. As a research artifact with open weights and data, it also serves as a reproducible baseline for studying reinforcement learning approaches to medical multimodal reasoning.

Impact

GMAI-VL-R1 is one of the early demonstrations that GRPO-style reinforcement learning with verifiable rewards can be applied to general medical vision-language models, extending the "reasoning model" paradigm popularized in general LLMs into the multimodal medical domain. Its public release of code, the GMAI-Reasoning10K dataset, and model weights lowers the barrier for follow-up work on RL-based medical reasoning. The finding that RL improves out-of-distribution generalization relative to supervised fine-tuning offers practical guidance for building medical AI systems that must operate across heterogeneous imaging sources. As a preprint, its benchmark numbers should be read as research results pending peer review, and like other medical multimodal models it is intended for research rather than autonomous clinical use.

Citation

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

Preprint

Su, Y., et al. (2025) GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning. arXiv.org.

DOI: 10.48550/arXiv.2504.01886

Recent citations

Papers that recently cited this model.

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution
Chunzheng Zhu, Jiaqi Zeng, Junyue Jiang, et al.
Apr 2026
9
MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline
Jiyao Liu, Jianghan Shen, Sida Song, et al.
Apr 2026
1
Omni-MMSI: Toward Identity-attributed Social Interaction Understanding
Xinpeng Li, Bolin Lai, Hardy Chen, et al.
Mar 2026
1

Top citations

The most-cited papers that cite this model.

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
Peng Xia, Jinglu Wang, Yi Peng, et al.
arXiv.org · May 2025
37
MedVLThinker: Simple Baselines for Multimodal Medical Reasoning
Xiaoke Huang, Juncheng Wu, Hui Liu, et al.
arXiv.org · Aug 2025
21
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Wenxuan Wang, Zizhan Ma, Meidan Ding, et al.
arXiv.org · Aug 2025
13
Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning
Jiacheng Lin, Zhenbang Wu, Jimeng Sun
arXiv.org · May 2025
12
OralGPT-Omni: A Versatile Dental Multimodal Large Language Model
Jing Hao, Yuci Liang, Lizhuo Lin, et al.
arXiv.org · Nov 2025
10

Citations

Total Citations29

Influential3

References0

GitHub

Stars19

Forks0

Open Issues1

Contributors1

Last Push1y ago

Fields of citing research

Computer Science100%
Medicine86%
Psychology4%
Engineering4%
Chemistry4%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

17Closed

Usability — can I run it?7

Reproducibility — can I retrain it?22

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Dataset

Key Features

RL-driven medical reasoning: Uses Group Relative Policy Optimization (GRPO) with verifiable rewards to teach the model to reason through multiple-choice medical questions, rather than relying solely on supervised demonstrations.

Reasoning data synthesis: A rejection-sampling pipeline generates high-quality step-by-step reasoning traces, packaged as the GMAI-Reasoning10K dataset of 10,000 curated multiple-choice questions spanning X-ray, CT, MRI, OCT, and ultrasound.

Stronger out-of-distribution generalization: RL training improves performance on held-out and cross-domain benchmarks where supervised fine-tuning tends to overfit to in-distribution patterns.

Multiple model scales: Released in both 3B and 7B variants (built on Qwen2.5-VL), allowing deployment trade-offs between accuracy and compute.

Open release: Code, the reasoning dataset, and model weights are released publicly through the official repository.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

Chunzheng Zhu, Jiaqi Zeng, Junyue Jiang, et al.

Apr 2026

MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline

Jiyao Liu, Jianghan Shen, Sida Song, et al.

Apr 2026

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

Xinpeng Li, Bolin Lai, Hardy Chen, et al.

Mar 2026

GMAI-VL-R1

#Key Features

#Technical Details

#Applications

#Impact

Citation

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

Recent citations

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

GMAI-VL-R1

#Key Features

#Technical Details

#Applications

#Impact

Citation

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

Recent citations

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact