bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingLanguage model

GMAI-VL-R1

Shanghai AI Laboratory / Fuzhou University / Shanghai Innovation Institute / Fudan University / Monash University / University of Washington / Stanford University

A reinforcement-learning-enhanced general medical vision-language model that adds step-by-step reasoning for medical image diagnosis and visual question answering.

Released: April 2025
Parameters: 7 Billion

GMAI-VL-R1 is a multimodal medical reasoning model that augments a general medical vision-language model with explicit, step-by-step reasoning learned through reinforcement learning (RL). It addresses a recurring weakness of existing general medical AI systems: while they can describe a medical image or answer a direct question, they often lack the structured reasoning needed for complex clinical decision-making, where intermediate inference steps matter as much as the final answer.

The model was introduced in April 2025 by researchers from Shanghai Artificial Intelligence Laboratory (the "uni-medical" group), Fuzhou University, Shanghai Innovation Institute, Fudan University, Monash University, the University of Washington, and Stanford University. It belongs to the broader GMAI-VL family of general medical AI vision-language models but distinguishes itself by being trained with verifiable-reward RL rather than supervised fine-tuning alone. The authors are among the first to apply Group Relative Policy Optimization (GRPO) to the multimodal medical domain at scale.

GMAI-VL-R1 fits into a fast-growing line of "reasoning-enhanced" medical multimodal models (alongside efforts such as MedVLM-R1), where RL on verifiable medical questions is used to elicit chain-of-thought style reasoning that generalizes to unseen tasks better than memorization-driven supervised training.

#Key Features

  • RL-driven medical reasoning: Uses Group Relative Policy Optimization (GRPO) with verifiable rewards to teach the model to reason through multiple-choice medical questions, rather than relying solely on supervised demonstrations.
  • Reasoning data synthesis: A rejection-sampling pipeline generates high-quality step-by-step reasoning traces, packaged as the GMAI-Reasoning10K dataset of 10,000 curated multiple-choice questions spanning X-ray, CT, MRI, OCT, and ultrasound.
  • Stronger out-of-distribution generalization: RL training improves performance on held-out and cross-domain benchmarks where supervised fine-tuning tends to overfit to in-distribution patterns.
  • Multiple model scales: Released in both 3B and 7B variants (built on Qwen2.5-VL), allowing deployment trade-offs between accuracy and compute.
  • Open release: Code, the reasoning dataset, and model weights are released publicly through the official repository.

#Technical Details

GMAI-VL-R1 is built on the Qwen2.5-VL vision-language backbone, with a primary 7B-parameter model and a smaller 3B variant. Training applies GRPO, a policy-gradient RL method that computes advantages over groups of sampled responses with KL-divergence regularization against a reference policy, using correctness on multiple-choice medical questions as the verifiable reward signal. The GMAI-Reasoning10K training set aggregates roughly 10,000 questions distilled from 95 public medical datasets across five imaging modalities. On evaluation, the 7B RL-tuned model improves over its supervised fine-tuning baseline on several benchmarks: GMAI-MMBench (val) rises to 43.14% and the validation split improves by about 3 points, with comparable gains on MMMU, MMMU-pro, and MedXpertQA-MM. The authors report that RL training generalizes better on out-of-distribution tasks, while supervised fine-tuning retains an edge on some in-distribution benchmarks such as OmniMedVQA.

#Applications

GMAI-VL-R1 targets medical image interpretation and visual question answering across radiology (X-ray, CT, MRI), ophthalmology (OCT), and ultrasound, with use cases in diagnostic support, clinical decision assistance, and medical education. By producing explicit reasoning chains rather than bare answers, it is better suited to settings where clinicians need to inspect and verify the rationale behind a model's output. As a research artifact with open weights and data, it also serves as a reproducible baseline for studying reinforcement learning approaches to medical multimodal reasoning.

#Impact

GMAI-VL-R1 is one of the early demonstrations that GRPO-style reinforcement learning with verifiable rewards can be applied to general medical vision-language models, extending the "reasoning model" paradigm popularized in general LLMs into the multimodal medical domain. Its public release of code, the GMAI-Reasoning10K dataset, and model weights lowers the barrier for follow-up work on RL-based medical reasoning. The finding that RL improves out-of-distribution generalization relative to supervised fine-tuning offers practical guidance for building medical AI systems that must operate across heterogeneous imaging sources. As a preprint, its benchmark numbers should be read as research results pending peer review, and like other medical multimodal models it is intended for research rather than autonomous clinical use.

Citation

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

Preprint

Su, Y., et al. (2025) GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning. arXiv.org.

DOI: 10.48550/arXiv.2504.01886

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations28
Influential3
References0

GitHub

Stars18
Forks0
Open Issues1
Contributors1
Last Push10mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
17Closed
Usability — can I run it?7
Reproducibility — can I retrain it?22
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

histologylanguage_modelmedical_image_diagnosismultimodalradiologyreasoningreinforcement_learningtransformervision_transformervisual_question_answering

Resources

GitHub RepositoryResearch PaperDataset