bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingBiosignalsLanguage model

QoQ-Med

MIT

Open multimodal clinical foundation model that jointly reasons over medical images, ECG time-series, and text reports, trained with domain-aware reinforcement learning.

Released: May 2025
Parameters: 7 Billion

QoQ-Med is an open generalist clinical foundation model that jointly reasons across heterogeneous medical data: 2D and 3D medical images, time-series physiological signals such as ECG, and free-text clinical reports. Where most clinical AI systems are built for a single modality or a single specialty, QoQ-Med is designed as a single model that can answer questions, classify findings, and generate reasoning traces across nine clinical domains, making it one of the first openly released multimodal clinical reasoning models.

The model was introduced in the 2025 paper "QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training" by Wei Dai, Peilin Chen, Chanakya Ekbote, and Paul Pu Liang at MIT (Media Lab and EECS), and was accepted as an oral presentation at NeurIPS 2025. Its central contribution is Domain-aware Relative Policy Optimization (DRPO), a reinforcement-learning objective that addresses a persistent problem in clinical training data: extreme imbalance across domains and modalities. Common findings in well-represented specialties dominate training, while rare conditions and harder modalities are under-learned.

By hierarchically scaling normalized rewards according to domain rarity and modality difficulty, DRPO counteracts this skew during training, yielding more balanced performance across specialties. QoQ-Med sits at the intersection of medical vision-language modeling and clinical decision support, extending the reasoning-via-reinforcement-learning paradigm (popularized by GRPO) into the multimodal clinical setting.

#Key Features

  • Cross-modal clinical reasoning: A single checkpoint interprets chest X-rays, CT, MRI, ultrasound, dermatology and ophthalmology images, pathology, mammography, and ECG signals alongside text reports.
  • Domain-aware reinforcement learning (DRPO): Rewards are hierarchically rescaled by domain rarity and modality difficulty, directly mitigating performance imbalance from skewed clinical data distributions.
  • Reasoning traces: The model emits explicit chains of reasoning toward diagnoses, and these traces are released to support interpretability and downstream research.
  • Open weights and pipeline: Model weights, a modular training pipeline, and reasoning traces are publicly released under an MIT license at two scales (7B and 32B).
  • Strong dense prediction: On segmentation, it reports IoU roughly 10x higher than other open models while matching the proprietary OpenAI o4-mini.

#Technical Details

QoQ-Med builds on the Qwen2.5-VL vision-language architecture and is released in two sizes, QoQ-Med-VL-7B (initialized from Qwen2.5-VL-7B-Instruct) and QoQ-Med-VL-32B. It was trained on 2.61 million instruction-tuning pairs spanning nine clinical domains including cardiology, radiology, dermatology, ophthalmology, pathology, and mammography. Training uses DRPO, a variant of Group Relative Policy Optimization in which the normalized advantage is scaled by domain-rarity and modality-difficulty factors. The reported results show DRPO delivering a 43% average improvement in macro-F1 across all visual domains relative to standard GRPO. On internal validation, QoQ-Med-VL-7B reaches 68.6% average accuracy and the 32B model reaches 70.7%. The model is explicitly positioned as a research preview not approved by any regulatory agency and not intended for clinical deployment without extensive real-world testing.

#Applications

QoQ-Med targets clinical research settings where a single model must interpret diverse inputs, such as triage assistants that combine imaging with ECG and notes, multimodal medical question answering, and benchmark studies of clinical reasoning. Because it spans imaging, physiological signals, and text in one system, it is useful to researchers building or evaluating generalist clinical assistants and to teams studying how reinforcement learning can balance performance across rare conditions and difficult modalities. The released reasoning traces also make it a resource for interpretability and for distilling clinical reasoning into smaller models.

#Impact

As one of the first openly released multimodal clinical foundation models with full weights, a training pipeline, and reasoning traces, QoQ-Med lowers the barrier for academic groups to study clinical reasoning across modalities without proprietary systems. Its DRPO method offers a transferable recipe for handling the long-tailed, multi-domain distributions endemic to medical data, a contribution recognized by its oral acceptance at NeurIPS 2025. As a 2025 research preview, its real-world clinical value remains unvalidated: the reported gains come from internal benchmarks, and the authors caution against deployment without extensive prospective testing. Its longer-term influence will depend on independent evaluation and on adoption of domain-aware reinforcement learning in subsequent clinical models.

Citation

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

Preprint

Dai, W., et al. (2025) QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training. arXiv.org.

DOI: 10.48550/arXiv.2506.00711

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations32
Influential4
References97

GitHub

Stars52
Forks4
Open Issues4
Contributors1
Last Push10mo ago
LanguagePython
LicenseMIT

HuggingFace

Downloads393
Likes6
Last Modified7mo ago
Pipelineimage-text-to-text

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
74Open
Usability — can I run it?100
Reproducibility — can I retrain it?34
open weights, closed recipe
Model Openness Framework
Unclassified
No formal model card / data card

Tags

ecgfoundation_modelhistologymedical_diagnosismultimodalreinforcement_learningsegmentationtransformervision_transformervisual_question_answering

Resources

GitHub RepositoryResearch PaperHuggingFace Model