MedDr

Hong Kong University of Science and Technology

Generalist medical vision-language foundation model with 40B parameters, spanning radiology, pathology, dermatology, retinography, and endoscopy.

Released: April 2024

Parameters: 40 Billion

MedDr is a generalist medical vision-language foundation model designed to interpret images and answer clinical questions across a broad range of medical specialties from a single set of weights. Whereas most medical AI systems are trained narrowly for one modality or task—a chest X-ray classifier, a dermatology grader, a pathology tile detector—MedDr targets the harder problem of a unified model that reasons over radiology, pathology, dermatology, retinography, and endoscopy alike, performing visual question answering, report generation, and diagnostic classification within a conversational interface.

The model was developed by the SMART Lab at the Hong Kong University of Science and Technology, led by Hao Chen, and released as a preprint in April 2024 (arXiv:2404.15127). At the time of release the authors described it as the largest open-source generalist foundation model tailored for medicine. MedDr is the centerpiece of a broader framework called GSCo (Generalist–Specialist Collaboration), in which the generalist model is paired with lightweight task-specific specialist models at inference time to improve diagnostic accuracy.

Its central methodological contribution is "diagnosis-guided bootstrapping," a data-construction strategy that converts large repositories of labeled medical images into high-quality image–text instruction data, sidestepping the scarcity of paired image–report corpora that has historically bottlenecked medical vision-language training.

Key Features

Multi-specialty coverage: A single model handles radiology (X-ray, CT, MRI), pathology, dermatology, retinography, and endoscopy, rather than being confined to one imaging domain.
Diagnosis-guided bootstrapping: The training pipeline exploits both medical images and their diagnostic labels to synthesize comprehensive reports and instruction-tuning examples, expanding the usable training signal beyond hand-written radiology reports.
Retrieval-augmented diagnosis: At inference, MedDr retrieves similar reference cases to ground its predictions, improving generalization to distributions and findings underrepresented in training.
Generalist–specialist collaboration (GSCo): Mixture-of-Expert and retrieval-augmented diagnosis mechanisms let the generalist consult specialist models, combining broad reasoning with task-tuned precision.
Open weights under MIT license: The MedDr_0401 checkpoint and a companion specialist are released on HuggingFace under a permissive MIT license.

Technical Details

MedDr is built on the InternVL vision-language architecture (OpenGVLab/InternVL-Chat-V1-2), comprising a vision transformer image encoder coupled to a large language model decoder, with roughly 40 billion parameters total in the released BF16 checkpoint. Training proceeds by first constructing instruction data through diagnosis-guided bootstrapping—generating descriptive reports from medical images and their labels—then integrating these with existing medical vision-language tasks (VQA, captioning, classification) for instruction tuning. The GSCo evaluation spanned 28 datasets and roughly 250,000 images across the supported modalities, assessing report generation, visual question answering, and image-level diagnosis. The authors report that pairing the generalist with specialists and retrieval augmentation improves performance over the generalist alone, particularly on out-of-distribution diagnostic tasks.

Applications

MedDr is aimed at researchers building multimodal clinical assistants and at studies probing how far a single generalist model can go across heterogeneous medical imaging. Practical use cases include drafting preliminary radiology and pathology reports, answering image-grounded clinical questions, and serving as a flexible backbone that can be combined with specialist classifiers in a collaborative diagnostic pipeline. Because the weights are openly licensed, it is also a convenient starting point for fine-tuning on institution-specific datasets or new modalities.

Impact

MedDr contributed to a wave of open generalist medical vision-language models that challenged the prevailing one-model-per-task paradigm, and its diagnosis-guided bootstrapping offered a reusable recipe for turning abundant labeled-image archives into vision-language training data. The accompanying GSCo framework articulated a pragmatic middle path—rather than expecting a generalist to dominate every task, it formalized collaboration between broad and narrow models. As with all current medical foundation models, the work remains a research artifact: it is not a cleared clinical device, evaluations rest largely on retrospective benchmarks, and performance varies across modalities, so outputs require expert oversight before any clinical use.

Citation

GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration

Preprint

He, S., et al. (2024) GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration.

DOI: 10.48550/arXiv.2404.15127

Recent citations

Papers that recently cited this model.

Super-Generalist: Towards Comprehensive and Accurate Medical Image Understanding via Generalist-Specialist Synergy
Shaoteng Zhang, Weiwei Cao, Wanxing Chang, et al.
Jul 2026
0
A modular generalist-specialist AI framework for ROI selection across spatial profiling workflow
Simon P. Castillo, Tanishq Gautam, Karina B. Pinao Gonzales, et al.
bioRxiv · Jul 2026
0
PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA
Chunze Yang, Qidong Liu, Wenjie Zhao, et al.
May 2026
0

Top citations

The most-cited papers that cite this model.

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Peng Xia, Kangyu Zhu, Haoran Li, et al.
Conference on Empirical Methods in Natural Language Processing · Jul 2024
100
Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks
Jun Hou, Sicen Liu, Yequan Bie, et al.
arXiv.org · Oct 2024
41
Large-Scale 3D Medical Image Pre-Training With Geometric Context Priors
Linshan Wu, Jiaxin Zhuang, Hao Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence · Oct 2024
37
UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
Haiyang Yu, Siyang Yi, Ke Niu, et al.
arXiv.org · Mar 2025
14
From Text to Multimodality: Exploring the Evolution and Impact of Large Language Models in Medical Practice
Qian Niu, Keyu Chen, Ming Li, et al.
arXiv.org · Sep 2024
13

Citations

Total Citations28

Influential4

References0

GitHub

Stars99

Forks6

Open Issues0

Contributors2

Last Push2mo ago

LanguagePython

LicenseMIT

HuggingFace

Downloads41

Likes7

Last Modified3mo ago

Pipelineimage-text-to-text

Fields of citing research

Medicine96%
Computer Science96%
Engineering7%
Biology4%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

69Partial

Usability — can I run it?93

Reproducibility — can I retrain it?50

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website HuggingFace Model

Key Features

Multi-specialty coverage: A single model handles radiology (X-ray, CT, MRI), pathology, dermatology, retinography, and endoscopy, rather than being confined to one imaging domain.

Diagnosis-guided bootstrapping: The training pipeline exploits both medical images and their diagnostic labels to synthesize comprehensive reports and instruction-tuning examples, expanding the usable training signal beyond hand-written radiology reports.

Retrieval-augmented diagnosis: At inference, MedDr retrieves similar reference cases to ground its predictions, improving generalization to distributions and findings underrepresented in training.

Generalist–specialist collaboration (GSCo): Mixture-of-Expert and retrieval-augmented diagnosis mechanisms let the generalist consult specialist models, combining broad reasoning with task-tuned precision.

Open weights under MIT license: The MedDr_0401 checkpoint and a companion specialist are released on HuggingFace under a permissive MIT license.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Super-Generalist: Towards Comprehensive and Accurate Medical Image Understanding via Generalist-Specialist Synergy

Shaoteng Zhang, Weiwei Cao, Wanxing Chang, et al.

Jul 2026

A modular generalist-specialist AI framework for ROI selection across spatial profiling workflow

Simon P. Castillo, Tanishq Gautam, Karina B. Pinao Gonzales, et al.

bioRxiv · Jul 2026

PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA

Chunze Yang, Qidong Liu, Wenjie Zhao, et al.

May 2026

MedDr

#Key Features

#Technical Details

#Applications

#Impact

Citation

GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration

Recent citations

Super-Generalist: Towards Comprehensive and Accurate Medical Image Understanding via Generalist-Specialist Synergy

PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

MedDr

#Key Features

#Technical Details

#Applications

#Impact

Citation

GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration

Recent citations

Super-Generalist: Towards Comprehensive and Accurate Medical Image Understanding via Generalist-Specialist Synergy

PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact