bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingLanguage model

MedVLM-R1

Technical University of Munich / Imperial College London / University of Oxford

A 2B-parameter medical vision-language model that uses reinforcement learning (GRPO) to produce explicit, human-interpretable reasoning for radiology visual question answering.

Released: February 2025
Parameters: 2 Billion

MedVLM-R1 is a compact medical vision-language model (VLM) that generates explicit, natural-language reasoning alongside its answers to questions about radiology images. It targets a central trust problem in medical AI: most diagnostic models output a final answer without showing how they arrived at it, which limits clinical confidence. Rather than relying on supervised fine-tuning over chains of reasoning, MedVLM-R1 uses a reinforcement learning (RL) framework that rewards the model for discovering human-interpretable reasoning paths on its own, without any reasoning references in the training data.

The model was introduced in February 2025 by Jiazhen Pan, Che Liu, Daniel Rueckert and colleagues at the Technical University of Munich, Imperial College London, and the University of Oxford, and the work was subsequently accepted at MICCAI 2025. It applies the DeepSeek-R1-style "incentivized reasoning" recipe — popularized for text language models — to the multimodal medical imaging domain, where interpretable, verifiable reasoning is especially valuable.

MedVLM-R1 sits at the intersection of medical imaging analysis and reasoning language models. Its central finding is that a small 2B-parameter model, trained with RL on only 600 visual question answering (VQA) samples, can outperform conventionally fine-tuned models trained on more than a million samples, while also producing transparent reasoning traces.

#Key Features

  • RL-induced reasoning: Uses Group Relative Policy Optimization (GRPO) with rule-based rewards for answer correctness and output format, incentivizing the model to emit <think> reasoning followed by a final answer without any supervised reasoning labels.
  • Interpretable outputs: Generates a human-readable rationale before each answer, improving transparency over black-box classifiers and standard VQA models.
  • Data efficiency: Trained on just 600 VQA samples drawn from the HuatuoGPT-Vision / PubMedVision data, yet generalizes across imaging modalities.
  • Strong out-of-distribution generalization: Trained primarily on MRI questions, it transfers to unseen CT and X-ray benchmarks better than supervised fine-tuned baselines.
  • Compact and open: Built on Qwen2-VL-2B-Instruct, with 2B parameters and Apache-2.0 licensed inference weights released on Hugging Face.

#Technical Details

MedVLM-R1 is built on the Qwen2-VL-2B-Instruct backbone, a vision-language transformer pairing a vision encoder with a 2B-parameter language model. It is trained with GRPO, a reinforcement learning algorithm that compares groups of sampled responses and optimizes toward higher-reward outputs using simple, verifiable reward functions — one for matching the correct multiple-choice answer and one for adhering to the required reasoning-then-answer format. Training used 600 MRI VQA samples from the HuatuoGPT-Vision dataset, with evaluation drawn from OmniMedVQA across MRI, CT, and X-ray. On these benchmarks, MedVLM-R1 raised accuracy from 55.11% to 78.22%, and exceeded the performance of much larger VLMs fine-tuned on over one million samples. The authors also report failure cases in which the generated reasoning is superficial or contradictory, noting that reasoning quality does not always track answer correctness.

#Applications

MedVLM-R1 is aimed at medical visual question answering and radiology decision support, where clinicians and researchers benefit from seeing a model's reasoning rather than only its final answer. Its compact size makes it practical to deploy in resource-constrained settings, and its data-efficient RL recipe offers a template for building interpretable diagnostic assistants in specialties where large labeled reasoning datasets are scarce. Released openly, it is well suited to research on trustworthy medical AI, reasoning evaluation, and modality transfer.

#Impact

MedVLM-R1 is an early demonstration that DeepSeek-R1-style reinforcement learning, which elicited emergent reasoning in text language models, transfers to small medical vision-language models. By showing that a 2B-parameter model trained on hundreds (not millions) of samples can both outperform larger supervised baselines and produce interpretable reasoning, it highlights RL as a data-efficient path toward transparent clinical AI. Its open weights and code, and acceptance at MICCAI 2025, have made it a reference point for subsequent work on reasoning-centric medical VLMs, while the authors' candid analysis of unfaithful reasoning underscores that interpretable output remains an open challenge.

Citations

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Preprint

Pan, J., et al. (2025) MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning. International Conference on Medical Image Computing and Computer-Assisted Intervention.

DOI: 10.48550/arXiv.2502.19634

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Pan, J., et al. (2025) MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning. Medical Image Computing and Computer Assisted Intervention – MICCAI 2025.

DOI: 10.1007/978-3-032-04981-0_32

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations171
Influential22
References35

GitHub

Stars29
Forks1
Open Issues7
Contributors0
Last Push8mo ago
LanguageJupyter Notebook

HuggingFace

Downloads455
Likes15
Last Modified8mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
83Open
Usability — can I run it?100
Reproducibility — can I retrain it?60
Model Openness Framework
Unclassified
No formal model card / data card

Tags

medical_reasoningmultimodalradiologyreinforcement_learningtransformervision_transformervisual_question_answering

Resources

GitHub RepositoryResearch PaperHuggingFace Model