bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingLanguage model

MedDr

Hong Kong University of Science and Technology

A 40B-parameter generalist medical vision-language foundation model spanning radiology, pathology, dermatology, retinography, and endoscopy.

Released: April 2024
Parameters: 40 Billion

MedDr is a generalist medical vision-language foundation model designed to interpret images and answer clinical questions across a broad range of medical specialties from a single set of weights. Whereas most medical AI systems are trained narrowly for one modality or task—a chest X-ray classifier, a dermatology grader, a pathology tile detector—MedDr targets the harder problem of a unified model that reasons over radiology, pathology, dermatology, retinography, and endoscopy alike, performing visual question answering, report generation, and diagnostic classification within a conversational interface.

The model was developed by the SMART Lab at the Hong Kong University of Science and Technology, led by Hao Chen, and released as a preprint in April 2024 (arXiv:2404.15127). At the time of release the authors described it as the largest open-source generalist foundation model tailored for medicine. MedDr is the centerpiece of a broader framework called GSCo (Generalist–Specialist Collaboration), in which the generalist model is paired with lightweight task-specific specialist models at inference time to improve diagnostic accuracy.

Its central methodological contribution is "diagnosis-guided bootstrapping," a data-construction strategy that converts large repositories of labeled medical images into high-quality image–text instruction data, sidestepping the scarcity of paired image–report corpora that has historically bottlenecked medical vision-language training.

#Key Features

  • Multi-specialty coverage: A single model handles radiology (X-ray, CT, MRI), pathology, dermatology, retinography, and endoscopy, rather than being confined to one imaging domain.
  • Diagnosis-guided bootstrapping: The training pipeline exploits both medical images and their diagnostic labels to synthesize comprehensive reports and instruction-tuning examples, expanding the usable training signal beyond hand-written radiology reports.
  • Retrieval-augmented diagnosis: At inference, MedDr retrieves similar reference cases to ground its predictions, improving generalization to distributions and findings underrepresented in training.
  • Generalist–specialist collaboration (GSCo): Mixture-of-Expert and retrieval-augmented diagnosis mechanisms let the generalist consult specialist models, combining broad reasoning with task-tuned precision.
  • Open weights under MIT license: The MedDr_0401 checkpoint and a companion specialist are released on HuggingFace under a permissive MIT license.

#Technical Details

MedDr is built on the InternVL vision-language architecture (OpenGVLab/InternVL-Chat-V1-2), comprising a vision transformer image encoder coupled to a large language model decoder, with roughly 40 billion parameters total in the released BF16 checkpoint. Training proceeds by first constructing instruction data through diagnosis-guided bootstrapping—generating descriptive reports from medical images and their labels—then integrating these with existing medical vision-language tasks (VQA, captioning, classification) for instruction tuning. The GSCo evaluation spanned 28 datasets and roughly 250,000 images across the supported modalities, assessing report generation, visual question answering, and image-level diagnosis. The authors report that pairing the generalist with specialists and retrieval augmentation improves performance over the generalist alone, particularly on out-of-distribution diagnostic tasks.

#Applications

MedDr is aimed at researchers building multimodal clinical assistants and at studies probing how far a single generalist model can go across heterogeneous medical imaging. Practical use cases include drafting preliminary radiology and pathology reports, answering image-grounded clinical questions, and serving as a flexible backbone that can be combined with specialist classifiers in a collaborative diagnostic pipeline. Because the weights are openly licensed, it is also a convenient starting point for fine-tuning on institution-specific datasets or new modalities.

#Impact

MedDr contributed to a wave of open generalist medical vision-language models that challenged the prevailing one-model-per-task paradigm, and its diagnosis-guided bootstrapping offered a reusable recipe for turning abundant labeled-image archives into vision-language training data. The accompanying GSCo framework articulated a pragmatic middle path—rather than expecting a generalist to dominate every task, it formalized collaboration between broad and narrow models. As with all current medical foundation models, the work remains a research artifact: it is not a cleared clinical device, evaluations rest largely on retrospective benchmarks, and performance varies across modalities, so outputs require expert oversight before any clinical use.

Citation

GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration

Preprint

He, S., et al. (2024) GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration.

DOI: 10.48550/arXiv.2404.15127

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations26
Influential4
References0

GitHub

Stars98
Forks6
Open Issues1
Contributors2
Last Push1mo ago
LanguagePython
LicenseMIT

HuggingFace

Downloads59
Likes7
Last Modified1mo ago
Pipelineimage-text-to-text

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
69Partial
Usability — can I run it?93
Reproducibility — can I retrain it?50
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

foundation_modelhistologyinstruction_tuningmedical_diagnosismultimodalradiologyreport_generationtransformervision_transformervisual_question_answering

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace Model