bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingLanguage model

Med-R1

Emory University / University of Southern California / University of Tokyo / Johns Hopkins University / Georgia Institute of Technology

A reinforcement-learning-trained medical vision-language model for generalizable reasoning across eight imaging modalities and five clinical question types.

Released: March 2025
Parameters: 2 Billion

Med-R1 is a vision-language model (VLM) for medical reasoning that is trained with reinforcement learning rather than the supervised fine-tuning typically used to adapt general VLMs to clinical tasks. Introduced in March 2025 by researchers at Emory University, the University of Southern California, the University of Tokyo, Johns Hopkins University, and the Georgia Institute of Technology, it targets a persistent weakness of medical VLMs: models tuned on one imaging modality or question format often fail to transfer to others, limiting their usefulness across the heterogeneous landscape of clinical imaging.

The central idea is to apply Group Relative Policy Optimization (GRPO) — the reward-guided strategy popularized by DeepSeek-R1 — to a compact open VLM, encouraging it to learn generalizable decision policies instead of memorizing dataset-specific annotations. Built on the 2-billion-parameter Qwen2-VL-2B-Instruct backbone, Med-R1 spans eight imaging modalities (CT, MRI, ultrasound, X-ray, fundus photography, OCT, dermoscopy, and microscopy) and five clinical question types, positioning it as a broad-coverage medical reasoning model rather than a single-task classifier.

A notable empirical finding is that explicit chain-of-thought reasoning is not always beneficial in this setting. The authors report that a "No-Thinking" variant, which omits intermediate reasoning steps, can outperform the reasoning-augmented model, suggesting that in medical VQA the quality and domain alignment of reasoning — not its mere presence — drive performance.

#Key Features

  • Reinforcement learning over SFT: Uses GRPO to optimize answers against rewards rather than supervised labels, improving cross-modality and cross-task generalization.
  • Eight-modality coverage: Reasons across CT, MRI, ultrasound, X-ray, fundus, OCT, dermoscopy, and microscopy within a single framework.
  • Five clinical question types: Handles modality recognition, anatomy identification, disease diagnosis, lesion grading, and biological attribute analysis.
  • Compact and efficient: At 2B parameters, it reportedly surpasses the 36x-larger Qwen2-VL-72B on medical tasks, lowering deployment cost.
  • Open weights and data: Model checkpoints (Apache-2.0) and the OmniMedVQA training data are publicly released.

#Technical Details

Med-R1 fine-tunes Qwen2-VL-2B-Instruct using GRPO, with input images resized to 384x384 pixels. Training and evaluation use the open-access portion of the OmniMedVQA benchmark — roughly 82,000 images and 89,000 visual question-answer pairs spanning the eight modalities and five question types — split 80/20 for training and testing. The release provides separate cross-modality and cross-task checkpoints. Reported results include a 29.94% average accuracy improvement over the Qwen2-VL-2B base model and a 32.06% gain in cross-question-type generalization, with the 2B model outperforming Qwen2-VL-72B on the studied medical reasoning tasks.

#Applications

Med-R1 is aimed at medical visual question answering, where a clinician or downstream system supplies an image and a natural-language question and the model returns an answer, optionally with a reasoning trace. Because it generalizes across modalities and task formats without per-task retraining, it is well suited to building flexible diagnostic-support prototypes, triage assistants, and educational tools that must handle radiology, ophthalmology, dermatology, and pathology imagery side by side. Its small footprint makes it attractive for resource-constrained or on-premise deployments where larger VLMs are impractical.

#Impact

Med-R1 contributes to a fast-growing line of work applying DeepSeek-R1-style reinforcement learning to multimodal medical models, appearing alongside closely related efforts such as MedVLM-R1. Its main contributions are evidence that GRPO can yield strong cross-modality and cross-task generalization in a small open VLM, and the counterintuitive observation that explicit reasoning steps are not universally helpful for medical VQA. As a research artifact with openly released weights and data, it lowers the barrier for studying RL-based medical reasoning. The work remains a preprint, and its evaluation is confined to the OmniMedVQA benchmark, so reported gains should be interpreted as benchmark results rather than validated clinical performance.

Citations

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Preprint

Lai, Y., et al. (2025) Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models. IEEE Transactions on Medical Imaging.

DOI: 10.48550/arXiv.2503.13939

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Lai, Y., et al. (2025) Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models. IEEE Transactions on Medical Imaging.

DOI: 10.1109/TMI.2026.3661001

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations125
Influential17
References45

GitHub

Stars125
Forks12
Open Issues10
Contributors1
Last Push11mo ago
LanguagePython

HuggingFace

Downloads0
Likes13
Last Modified11mo ago
Pipelinevisual-question-answering

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
45Partial
Usability — can I run it?55
Reproducibility — can I retrain it?22
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

disease_diagnosisgenerativehistologyimage_classificationmedical_visual_question_answeringmultimodalradiologyreinforcement_learningtransformervision_transformer

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDataset