MedVLM-R1

Technical University of Munich / Imperial College London / University of Oxford

2B-parameter medical vision-language model that uses reinforcement learning to show interpretable reasoning for radiology visual question answering.

Released: February 2025

Parameters: 2 Billion

MedVLM-R1 is a compact medical vision-language model (VLM) that generates explicit, natural-language reasoning alongside its answers to questions about radiology images. It targets a central trust problem in medical AI: most diagnostic models output a final answer without showing how they arrived at it, which limits clinical confidence. Rather than relying on supervised fine-tuning over chains of reasoning, MedVLM-R1 uses a reinforcement learning (RL) framework that rewards the model for discovering human-interpretable reasoning paths on its own, without any reasoning references in the training data.

The model was introduced in February 2025 by Jiazhen Pan, Che Liu, Daniel Rueckert and colleagues at the Technical University of Munich, Imperial College London, and the University of Oxford, and the work was subsequently accepted at MICCAI 2025. It applies the DeepSeek-R1-style "incentivized reasoning" recipe — popularized for text language models — to the multimodal medical imaging domain, where interpretable, verifiable reasoning is especially valuable.

MedVLM-R1 sits at the intersection of medical imaging analysis and reasoning language models. Its central finding is that a small 2B-parameter model, trained with RL on only 600 visual question answering (VQA) samples, can outperform conventionally fine-tuned models trained on more than a million samples, while also producing transparent reasoning traces.

Key Features

RL-induced reasoning: Uses Group Relative Policy Optimization (GRPO) with rule-based rewards for answer correctness and output format, incentivizing the model to emit <think> reasoning followed by a final answer without any supervised reasoning labels.
Interpretable outputs: Generates a human-readable rationale before each answer, improving transparency over black-box classifiers and standard VQA models.
Data efficiency: Trained on just 600 VQA samples drawn from the HuatuoGPT-Vision / PubMedVision data, yet generalizes across imaging modalities.
Strong out-of-distribution generalization: Trained primarily on MRI questions, it transfers to unseen CT and X-ray benchmarks better than supervised fine-tuned baselines.
Compact and open: Built on Qwen2-VL-2B-Instruct, with 2B parameters and Apache-2.0 licensed inference weights released on Hugging Face.

Technical Details

MedVLM-R1 is built on the Qwen2-VL-2B-Instruct backbone, a vision-language transformer pairing a vision encoder with a 2B-parameter language model. It is trained with GRPO, a reinforcement learning algorithm that compares groups of sampled responses and optimizes toward higher-reward outputs using simple, verifiable reward functions — one for matching the correct multiple-choice answer and one for adhering to the required reasoning-then-answer format. Training used 600 MRI VQA samples from the HuatuoGPT-Vision dataset, with evaluation drawn from OmniMedVQA across MRI, CT, and X-ray. On these benchmarks, MedVLM-R1 raised accuracy from 55.11% to 78.22%, and exceeded the performance of much larger VLMs fine-tuned on over one million samples. The authors also report failure cases in which the generated reasoning is superficial or contradictory, noting that reasoning quality does not always track answer correctness.

Applications

MedVLM-R1 is aimed at medical visual question answering and radiology decision support, where clinicians and researchers benefit from seeing a model's reasoning rather than only its final answer. Its compact size makes it practical to deploy in resource-constrained settings, and its data-efficient RL recipe offers a template for building interpretable diagnostic assistants in specialties where large labeled reasoning datasets are scarce. Released openly, it is well suited to research on trustworthy medical AI, reasoning evaluation, and modality transfer.

Impact

MedVLM-R1 is an early demonstration that DeepSeek-R1-style reinforcement learning, which elicited emergent reasoning in text language models, transfers to small medical vision-language models. By showing that a 2B-parameter model trained on hundreds (not millions) of samples can both outperform larger supervised baselines and produce interpretable reasoning, it highlights RL as a data-efficient path toward transparent clinical AI. Its open weights and code, and acceptance at MICCAI 2025, have made it a reference point for subsequent work on reasoning-centric medical VLMs, while the authors' candid analysis of unfaithful reasoning underscores that interpretable output remains an open challenge.

Citations

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Preprint

Pan, J., et al. (2025) MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning. International Conference on Medical Image Computing and Computer-Assisted Intervention.

DOI: 10.48550/arXiv.2502.19634

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Pan, J., et al. (2025) MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning. Medical Image Computing and Computer Assisted Intervention – MICCAI 2025.

DOI: 10.1007/978-3-032-04981-0_32

Recent citations

Papers that recently cited this model.

Beyond textual rationales: Anatomy-grounded chain-of-thought for traceable radiology reasoning
Shengzhi Wang, Kai Wu, Jun Yang, et al.
Knowledge-Based Systems · Sep 2026
0
The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy
Chunzheng Zhu, Lei Tian, Bohan Tan, et al.
Jul 2026
0Influential
Policy-Driven CT-Agent: Modeling Phase-Aware Diagnostic Control for Clinically Consistent CT Reasoning
Yanmeng Dong, Han Li, Yujia Li, et al.
Jul 2026
0

Top citations

The most-cited papers that cite this model.

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Shenzhi Wang, Le Yu, Chang Gao, et al.
arXiv.org · Jun 2025
471
From System 1 to System 2: A Survey of Reasoning Large Language Models
Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence · Feb 2025
265
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yaoting Wang, Shengqiong Wu, Yuechen Zhang, et al.
arXiv.org · Mar 2025
183
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan, Yuheng Ji, Xiaoshuai Hao, et al.
arXiv.org · 2025
143
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Yuxiang Lai, Jike Zhong, Ming Li, et al.
IEEE Transactions on Medical Imaging · Mar 2025
126

Citations

Total Citations191

Influential24

References35

GitHub

Stars32

Forks1

Open Issues6

Contributors0

Last Push10mo ago

LanguageJupyter Notebook

HuggingFace

Downloads1.2K

Likes17

Last Modified9mo ago

Fields of citing research

Computer Science99%
Medicine72%
Engineering16%
Linguistics2%
Mathematics2%
Biology2%
Business1%
Psychology1%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

83Open

Usability — can I run it?100

Reproducibility — can I retrain it?60

Model Openness Framework

Unclassified

No formal model card / data card

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

RL-induced reasoning: Uses Group Relative Policy Optimization (GRPO) with rule-based rewards for answer correctness and output format, incentivizing the model to emit <think> reasoning followed by a final answer without any supervised reasoning labels.

Interpretable outputs: Generates a human-readable rationale before each answer, improving transparency over black-box classifiers and standard VQA models.

Data efficiency: Trained on just 600 VQA samples drawn from the HuatuoGPT-Vision / PubMedVision data, yet generalizes across imaging modalities.

Strong out-of-distribution generalization: Trained primarily on MRI questions, it transfers to unseen CT and X-ray benchmarks better than supervised fine-tuned baselines.

Compact and open: Built on Qwen2-VL-2B-Instruct, with 2B parameters and Apache-2.0 licensed inference weights released on Hugging Face.

Technical Details

Applications

Impact

Citations

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Preprint

DOI: 10.48550/arXiv.2502.19634

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

DOI: 10.1007/978-3-032-04981-0_32

Recent citations

Papers that recently cited this model.

Beyond textual rationales: Anatomy-grounded chain-of-thought for traceable radiology reasoning

Shengzhi Wang, Kai Wu, Jun Yang, et al.

Knowledge-Based Systems · Sep 2026

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Chunzheng Zhu, Lei Tian, Bohan Tan, et al.

Jul 2026

0Influential

Policy-Driven CT-Agent: Modeling Phase-Aware Diagnostic Control for Clinically Consistent CT Reasoning

Yanmeng Dong, Han Li, Yujia Li, et al.

Jul 2026

MedVLM-R1

#Key Features

#Technical Details

#Applications

#Impact

Citations

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Recent citations

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Policy-Driven CT-Agent: Modeling Phase-Aware Diagnostic Control for Clinically Consistent CT Reasoning

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

MedVLM-R1

#Key Features

#Technical Details

#Applications

#Impact

Citations

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Recent citations

The Path to Self-Evolving Clinical Systems: Scaling Medical Agents from Assistance to Autonomy

Policy-Driven CT-Agent: Modeling Phase-Aware Diagnostic Control for Clinically Consistent CT Reasoning

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact