Open multimodal clinical foundation model that jointly reasons over medical images, ECG time-series, and text reports, trained with domain-aware reinforcement learning.
QoQ-Med is an open generalist clinical foundation model that jointly reasons across heterogeneous medical data: 2D and 3D medical images, time-series physiological signals such as ECG, and free-text clinical reports. Where most clinical AI systems are built for a single modality or a single specialty, QoQ-Med is designed as a single model that can answer questions, classify findings, and generate reasoning traces across nine clinical domains, making it one of the first openly released multimodal clinical reasoning models.
The model was introduced in the 2025 paper "QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training" by Wei Dai, Peilin Chen, Chanakya Ekbote, and Paul Pu Liang at MIT (Media Lab and EECS), and was accepted as an oral presentation at NeurIPS 2025. Its central contribution is Domain-aware Relative Policy Optimization (DRPO), a reinforcement-learning objective that addresses a persistent problem in clinical training data: extreme imbalance across domains and modalities. Common findings in well-represented specialties dominate training, while rare conditions and harder modalities are under-learned.
By hierarchically scaling normalized rewards according to domain rarity and modality difficulty, DRPO counteracts this skew during training, yielding more balanced performance across specialties. QoQ-Med sits at the intersection of medical vision-language modeling and clinical decision support, extending the reasoning-via-reinforcement-learning paradigm (popularized by GRPO) into the multimodal clinical setting.
QoQ-Med builds on the Qwen2.5-VL vision-language architecture and is released in two sizes, QoQ-Med-VL-7B (initialized from Qwen2.5-VL-7B-Instruct) and QoQ-Med-VL-32B. It was trained on 2.61 million instruction-tuning pairs spanning nine clinical domains including cardiology, radiology, dermatology, ophthalmology, pathology, and mammography. Training uses DRPO, a variant of Group Relative Policy Optimization in which the normalized advantage is scaled by domain-rarity and modality-difficulty factors. The reported results show DRPO delivering a 43% average improvement in macro-F1 across all visual domains relative to standard GRPO. On internal validation, QoQ-Med-VL-7B reaches 68.6% average accuracy and the 32B model reaches 70.7%. The model is explicitly positioned as a research preview not approved by any regulatory agency and not intended for clinical deployment without extensive real-world testing.
QoQ-Med targets clinical research settings where a single model must interpret diverse inputs, such as triage assistants that combine imaging with ECG and notes, multimodal medical question answering, and benchmark studies of clinical reasoning. Because it spans imaging, physiological signals, and text in one system, it is useful to researchers building or evaluating generalist clinical assistants and to teams studying how reinforcement learning can balance performance across rare conditions and difficult modalities. The released reasoning traces also make it a resource for interpretability and for distilling clinical reasoning into smaller models.
As one of the first openly released multimodal clinical foundation models with full weights, a training pipeline, and reasoning traces, QoQ-Med lowers the barrier for academic groups to study clinical reasoning across modalities without proprietary systems. Its DRPO method offers a transferable recipe for handling the long-tailed, multi-domain distributions endemic to medical data, a contribution recognized by its oral acceptance at NeurIPS 2025. As a 2025 research preview, its real-world clinical value remains unvalidated: the reported gains come from internal benchmarks, and the authors caution against deployment without extensive prospective testing. Its longer-term influence will depend on independent evaluation and on adoption of domain-aware reinforcement learning in subsequent clinical models.
Dai, W., et al. (2025) QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training. arXiv.org.
DOI: 10.48550/arXiv.2506.00711Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data