Med-R1

Emory University / University of Southern California / University of Tokyo / Johns Hopkins University / Georgia Institute of Technology

Medical vision-language model trained with reinforcement learning for generalizable reasoning across eight imaging modalities and five question types.

Released: March 2025

Parameters: 2 Billion

Med-R1 is a vision-language model (VLM) for medical reasoning that is trained with reinforcement learning rather than the supervised fine-tuning typically used to adapt general VLMs to clinical tasks. Introduced in March 2025 by researchers at Emory University, the University of Southern California, the University of Tokyo, Johns Hopkins University, and the Georgia Institute of Technology, it targets a persistent weakness of medical VLMs: models tuned on one imaging modality or question format often fail to transfer to others, limiting their usefulness across the heterogeneous landscape of clinical imaging.

The central idea is to apply Group Relative Policy Optimization (GRPO) — the reward-guided strategy popularized by DeepSeek-R1 — to a compact open VLM, encouraging it to learn generalizable decision policies instead of memorizing dataset-specific annotations. Built on the 2-billion-parameter Qwen2-VL-2B-Instruct backbone, Med-R1 spans eight imaging modalities (CT, MRI, ultrasound, X-ray, fundus photography, OCT, dermoscopy, and microscopy) and five clinical question types, positioning it as a broad-coverage medical reasoning model rather than a single-task classifier.

A notable empirical finding is that explicit chain-of-thought reasoning is not always beneficial in this setting. The authors report that a "No-Thinking" variant, which omits intermediate reasoning steps, can outperform the reasoning-augmented model, suggesting that in medical VQA the quality and domain alignment of reasoning — not its mere presence — drive performance.

Key Features

Reinforcement learning over SFT: Uses GRPO to optimize answers against rewards rather than supervised labels, improving cross-modality and cross-task generalization.
Eight-modality coverage: Reasons across CT, MRI, ultrasound, X-ray, fundus, OCT, dermoscopy, and microscopy within a single framework.
Five clinical question types: Handles modality recognition, anatomy identification, disease diagnosis, lesion grading, and biological attribute analysis.
Compact and efficient: At 2B parameters, it reportedly surpasses the 36x-larger Qwen2-VL-72B on medical tasks, lowering deployment cost.
Open weights and data: Model checkpoints (Apache-2.0) and the OmniMedVQA training data are publicly released.

Technical Details

Med-R1 fine-tunes Qwen2-VL-2B-Instruct using GRPO, with input images resized to 384x384 pixels. Training and evaluation use the open-access portion of the OmniMedVQA benchmark — roughly 82,000 images and 89,000 visual question-answer pairs spanning the eight modalities and five question types — split 80/20 for training and testing. The release provides separate cross-modality and cross-task checkpoints. Reported results include a 29.94% average accuracy improvement over the Qwen2-VL-2B base model and a 32.06% gain in cross-question-type generalization, with the 2B model outperforming Qwen2-VL-72B on the studied medical reasoning tasks.

Applications

Med-R1 is aimed at medical visual question answering, where a clinician or downstream system supplies an image and a natural-language question and the model returns an answer, optionally with a reasoning trace. Because it generalizes across modalities and task formats without per-task retraining, it is well suited to building flexible diagnostic-support prototypes, triage assistants, and educational tools that must handle radiology, ophthalmology, dermatology, and pathology imagery side by side. Its small footprint makes it attractive for resource-constrained or on-premise deployments where larger VLMs are impractical.

Impact

Med-R1 contributes to a fast-growing line of work applying DeepSeek-R1-style reinforcement learning to multimodal medical models, appearing alongside closely related efforts such as MedVLM-R1. Its main contributions are evidence that GRPO can yield strong cross-modality and cross-task generalization in a small open VLM, and the counterintuitive observation that explicit reasoning steps are not universally helpful for medical VQA. As a research artifact with openly released weights and data, it lowers the barrier for studying RL-based medical reasoning. The work remains a preprint, and its evaluation is confined to the OmniMedVQA benchmark, so reported gains should be interpreted as benchmark results rather than validated clinical performance.

Citations

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Preprint

Lai, Y., et al. (2025) Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models. IEEE Transactions on Medical Imaging.

DOI: 10.48550/arXiv.2503.13939

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Lai, Y., et al. (2025) Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models. IEEE Transactions on Medical Imaging.

DOI: 10.1109/TMI.2026.3661001

Recent citations

Papers that recently cited this model.

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy
Chunzheng Zhu, Lei Tian, Bohan Tan, et al.
Jul 2026
0
Multimodal AI in healthcare: Review of vision-language foundation models for real-world medical applications.
Taha Razzaq, Murtaza Taj, Asim Iqbal
Journal of Biomedical Informatics · Jul 2026
0
Token-Sparse Medical Multimodal Reasoning via Dual-Stream Reinforcement Learning
Kaitao Chen, Weiqian Zhao, Jiamin Wu, et al.
Jun 2026
0

Top citations

The most-cited papers that cite this model.

Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng, Kaixiong Gong, Bohao Li, et al.
arXiv.org · Mar 2025
405
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu, Jinjie Ni, Zijian Wu, et al.
arXiv.org · Apr 2025
84
JudgeLRM: Large Reasoning Models as a Judge
Nuo Chen, Zhiyuan Hu, Qingyun Zou, et al.
arXiv.org · Mar 2025
79
The Invisible Leash: Why RLVR May or May Not Escape Its Origin
Fang Wu, Weihao Xuan, Ximing Lu, et al.
Jul 2025
46
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan, Kaituo Feng, Haoming Lyu, et al.
arXiv.org · May 2025
45

Citations

Total Citations139

Influential15

References45

GitHub

Stars128

Forks12

Open Issues10

Contributors1

Last Push1y ago

LanguagePython

HuggingFace

Downloads0

Likes13

Last Modified1y ago

Pipelinevisual-question-answering

Fields of citing research

Computer Science99%
Medicine72%
Engineering13%
Environmental Science2%
Biology2%
Chemistry1%
Linguistics1%
Mathematics1%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

45Partial

Usability — can I run it?55

Reproducibility — can I retrain it?22

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model Dataset

Key Features

Reinforcement learning over SFT: Uses GRPO to optimize answers against rewards rather than supervised labels, improving cross-modality and cross-task generalization.

Eight-modality coverage: Reasons across CT, MRI, ultrasound, X-ray, fundus, OCT, dermoscopy, and microscopy within a single framework.

Five clinical question types: Handles modality recognition, anatomy identification, disease diagnosis, lesion grading, and biological attribute analysis.

Compact and efficient: At 2B parameters, it reportedly surpasses the 36x-larger Qwen2-VL-72B on medical tasks, lowering deployment cost.

Open weights and data: Model checkpoints (Apache-2.0) and the OmniMedVQA training data are publicly released.

Technical Details

Applications

Impact

Citations

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Preprint

Lai, Y., et al. (2025) Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models. IEEE Transactions on Medical Imaging.

DOI: 10.48550/arXiv.2503.13939

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Lai, Y., et al. (2025) Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models. IEEE Transactions on Medical Imaging.

DOI: 10.1109/TMI.2026.3661001

Recent citations

Papers that recently cited this model.

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Chunzheng Zhu, Lei Tian, Bohan Tan, et al.

Jul 2026

Multimodal AI in healthcare: Review of vision-language foundation models for real-world medical applications.

Taha Razzaq, Murtaza Taj, Asim Iqbal

Journal of Biomedical Informatics · Jul 2026

Token-Sparse Medical Multimodal Reasoning via Dual-Stream Reinforcement Learning

Kaitao Chen, Weiqian Zhao, Jiamin Wu, et al.

Jun 2026

Top citations

The most-cited papers that cite this model.

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li, et al.

arXiv.org · Mar 2025

405

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Xiangyan Liu, Jinjie Ni, Zijian Wu, et al.

arXiv.org · Apr 2025

JudgeLRM: Large Reasoning Models as a Judge

Nuo Chen, Zhiyuan Hu, Qingyun Zou, et al.

arXiv.org · Mar 2025

The Invisible Leash: Why RLVR May or May Not Escape Its Origin

Fang Wu, Weihao Xuan, Ximing Lu, et al.

Jul 2025

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Kaixuan Fan, Kaituo Feng, Haoming Lyu, et al.

arXiv.org · May 2025

Med-R1

#Key Features

#Technical Details

#Applications

#Impact

Citations

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Recent citations

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Token-Sparse Medical Multimodal Reasoning via Dual-Stream Reinforcement Learning

Top citations

The Invisible Leash: Why RLVR May or May Not Escape Its Origin

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Med-R1

#Key Features

#Technical Details

#Applications

#Impact

Citations

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Recent citations

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Token-Sparse Medical Multimodal Reasoning via Dual-Stream Reinforcement Learning

Top citations

The Invisible Leash: Why RLVR May or May Not Escape Its Origin

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact