QoQ-Med

Multimodal clinical foundation model reasoning jointly over 2D and 3D medical images, ECG time-series, and text reports across nine clinical domains.

Released: May 2025

Parameters: 7 Billion

QoQ-Med is an open generalist clinical foundation model that jointly reasons across heterogeneous medical data: 2D and 3D medical images, time-series physiological signals such as ECG, and free-text clinical reports. Where most clinical AI systems are built for a single modality or a single specialty, QoQ-Med is designed as a single model that can answer questions, classify findings, and generate reasoning traces across nine clinical domains, making it one of the first openly released multimodal clinical reasoning models.

The model was introduced in the 2025 paper "QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training" by Wei Dai, Peilin Chen, Chanakya Ekbote, and Paul Pu Liang at MIT (Media Lab and EECS), and was accepted as an oral presentation at NeurIPS 2025. Its central contribution is Domain-aware Relative Policy Optimization (DRPO), a reinforcement-learning objective that addresses a persistent problem in clinical training data: extreme imbalance across domains and modalities. Common findings in well-represented specialties dominate training, while rare conditions and harder modalities are under-learned.

By hierarchically scaling normalized rewards according to domain rarity and modality difficulty, DRPO counteracts this skew during training, yielding more balanced performance across specialties. QoQ-Med sits at the intersection of medical vision-language modeling and clinical decision support, extending the reasoning-via-reinforcement-learning paradigm (popularized by GRPO) into the multimodal clinical setting.

Key Features

Cross-modal clinical reasoning: A single checkpoint interprets chest X-rays, CT, MRI, ultrasound, dermatology and ophthalmology images, pathology, mammography, and ECG signals alongside text reports.
Domain-aware reinforcement learning (DRPO): Rewards are hierarchically rescaled by domain rarity and modality difficulty, directly mitigating performance imbalance from skewed clinical data distributions.
Reasoning traces: The model emits explicit chains of reasoning toward diagnoses, and these traces are released to support interpretability and downstream research.
Open weights and pipeline: Model weights, a modular training pipeline, and reasoning traces are publicly released under an MIT license at two scales (7B and 32B).
Strong dense prediction: On segmentation, it reports IoU roughly 10x higher than other open models while matching the proprietary OpenAI o4-mini.

Technical Details

QoQ-Med builds on the Qwen2.5-VL vision-language architecture and is released in two sizes, QoQ-Med-VL-7B (initialized from Qwen2.5-VL-7B-Instruct) and QoQ-Med-VL-32B. It was trained on 2.61 million instruction-tuning pairs spanning nine clinical domains including cardiology, radiology, dermatology, ophthalmology, pathology, and mammography. Training uses DRPO, a variant of Group Relative Policy Optimization in which the normalized advantage is scaled by domain-rarity and modality-difficulty factors. The reported results show DRPO delivering a 43% average improvement in macro-F1 across all visual domains relative to standard GRPO. On internal validation, QoQ-Med-VL-7B reaches 68.6% average accuracy and the 32B model reaches 70.7%. The model is explicitly positioned as a research preview not approved by any regulatory agency and not intended for clinical deployment without extensive real-world testing.

Applications

QoQ-Med targets clinical research settings where a single model must interpret diverse inputs, such as triage assistants that combine imaging with ECG and notes, multimodal medical question answering, and benchmark studies of clinical reasoning. Because it spans imaging, physiological signals, and text in one system, it is useful to researchers building or evaluating generalist clinical assistants and to teams studying how reinforcement learning can balance performance across rare conditions and difficult modalities. The released reasoning traces also make it a resource for interpretability and for distilling clinical reasoning into smaller models.

Impact

As one of the first openly released multimodal clinical foundation models with full weights, a training pipeline, and reasoning traces, QoQ-Med lowers the barrier for academic groups to study clinical reasoning across modalities without proprietary systems. Its DRPO method offers a transferable recipe for handling the long-tailed, multi-domain distributions endemic to medical data, a contribution recognized by its oral acceptance at NeurIPS 2025. As a 2025 research preview, its real-world clinical value remains unvalidated: the reported gains come from internal benchmarks, and the authors caution against deployment without extensive prospective testing. Its longer-term influence will depend on independent evaluation and on adoption of domain-aware reinforcement learning in subsequent clinical models.

Citation

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

Preprint

Dai, W., et al. (2025) QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training. arXiv.org.

DOI: 10.48550/arXiv.2506.00711

Recent citations

Papers that recently cited this model.

Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning
Junha Jung, Minbyul Jeong, Suhyeon Lim, et al.
Jun 2026
1
Learning Where to Look: A Reinforcement Learning Framework for Robust Micro-Ultrasound Prostate Cancer Detection
Mohammad Mahdi Abootorabi, Sina Namazi, Armin Saadat, et al.
Jun 2026
0
OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models
Negin Baghbanzadeh, Pritam Sarkar, Michael Colacci, et al.
Jun 2026
0

Top citations

The most-cited papers that cite this model.

OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning
Timothy Ossowski, Sheng Zhang, Qianchu Liu, et al.
arXiv.org · Nov 2025
6
MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning
Zhengayuan Jiang, Heng Guo, Chen Fang, et al.
Apr 2026
5
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
Zhuofan Shi, Yufei Shao, Mengyan Dai, et al.
arXiv.org · Jan 2026
5
ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation
Jiarui Jin, Haoyu Wang, Xingliang Wu, et al.
arXiv.org · Feb 2026
4Influential
Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning
Ruiqi Wu, Yuang Yao, Tengfei Ma, et al.
arXiv.org · Aug 2025
3

Citations

Total Citations38

Influential5

References97

GitHub

Stars52

Forks4

Open Issues4

Contributors1

Last Push11mo ago

LanguagePython

LicenseMIT

HuggingFace

Downloads152

Likes6

Last Modified9mo ago

Pipelineimage-text-to-text

Fields of citing research

Computer Science100%
Medicine71%
Engineering17%
Biology6%
Sociology3%
Agricultural and Food Sciences3%
Environmental Science3%
Philosophy3%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

74Open

Usability — can I run it?100

Reproducibility — can I retrain it?34

open weights, closed recipe

Model Openness Framework

Unclassified

No formal model card / data card

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Cross-modal clinical reasoning: A single checkpoint interprets chest X-rays, CT, MRI, ultrasound, dermatology and ophthalmology images, pathology, mammography, and ECG signals alongside text reports.

Domain-aware reinforcement learning (DRPO): Rewards are hierarchically rescaled by domain rarity and modality difficulty, directly mitigating performance imbalance from skewed clinical data distributions.

Reasoning traces: The model emits explicit chains of reasoning toward diagnoses, and these traces are released to support interpretability and downstream research.

Open weights and pipeline: Model weights, a modular training pipeline, and reasoning traces are publicly released under an MIT license at two scales (7B and 32B).

Strong dense prediction: On segmentation, it reports IoU roughly 10x higher than other open models while matching the proprietary OpenAI o4-mini.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

Junha Jung, Minbyul Jeong, Suhyeon Lim, et al.

Jun 2026

Learning Where to Look: A Reinforcement Learning Framework for Robust Micro-Ultrasound Prostate Cancer Detection

Mohammad Mahdi Abootorabi, Sina Namazi, Armin Saadat, et al.

Jun 2026

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models

Negin Baghbanzadeh, Pritam Sarkar, Michael Colacci, et al.

Jun 2026

QoQ-Med

#Key Features

#Technical Details

#Applications

#Impact

Citation

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

Recent citations

Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

Learning Where to Look: A Reinforcement Learning Framework for Robust Micro-Ultrasound Prostate Cancer Detection

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models

Top citations

MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

QoQ-Med

#Key Features

#Technical Details

#Applications

#Impact

Citation

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

Recent citations

Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

Learning Where to Look: A Reinforcement Learning Framework for Robust Micro-Ultrasound Prostate Cancer Detection

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models

Top citations

MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact