bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingLanguage model

RadFM

Shanghai Jiao Tong University / Shanghai AI Laboratory

A generalist radiology foundation model that handles interleaved 2D and 3D medical scans with text for diagnosis, VQA, and report generation.

Released: August 2023
Parameters: 14 Billion

RadFM (Radiology Foundation Model) is a generalist multimodal foundation model for radiology that processes arbitrary combinations of 2D and 3D medical scans interleaved with natural language text. Unlike earlier medical vision-language models that were restricted to single 2D images (typically chest X-rays), RadFM accepts multiple images of mixed dimensionality within a single prompt and generates free-form text, enabling it to address modality recognition, disease diagnosis, visual question answering, report generation, and rationale diagnosis within one unified interface.

The model was developed by researchers at Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory, with the original preprint posted in August 2023 and the peer-reviewed version published in Nature Communications in August 2025. Its central contribution is treating radiology as a visually-conditioned text-generation problem at web scale, training across dozens of modalities and anatomical regions rather than specializing on a single body part or scanner type.

A key enabling artifact is MedMD, a large-scale medical multimodal dataset of roughly 16 million 2D and 3D scans paired with text and covering more than 5,000 diseases. The team contributed four new datasets toward this corpus and released a cleaned, radiology-focused subset (RadMD) for instruction tuning, positioning RadFM as one of the first openly released foundation models built to reason jointly over volumetric and planar medical imaging.

#Key Features

  • Unified 2D and 3D handling: A 3D Vision Transformer with a Perceiver module encodes both planar and volumetric scans into a fixed 32-token representation per image, so CT, MRI, X-ray, and other modalities share one input pathway.
  • Interleaved multi-image prompts: The model accepts several images intermixed with text in a single query, mirroring how radiologists reason across prior studies and multiple views rather than over one image at a time.
  • Generalist task coverage: A single checkpoint performs modality recognition, closed- and open-ended diagnosis, visual question answering, report generation, and rationale generation without task-specific heads.
  • Open weights and data: The 14B-parameter checkpoint is released on HuggingFace under an MIT license, with training data tables and the RadMD subset published to support reproduction and downstream fine-tuning.

#Technical Details

RadFM couples a 3D ViT vision encoder and Perceiver resampler to a MedLLaMA-13B language backbone (a medical-domain-adapted LLaMA-13B), for roughly 14 billion total parameters. Images are inserted into the text stream via special placeholder tokens whose embeddings are replaced by the visual tokens, allowing the autoregressive decoder to fuse vision and language naturally. The model is trained by visually-conditioned generative pre-training on MedMD (~16M 2D/3D scans across 5,000+ diseases) and then instruction-tuned on RadMD, a cleaned set of about 3 million radiologic visual-language pairs. The authors contributed four new datasets — PMC-Inline, RP3D, PMC-CaseReport, and MPx — adding roughly 13 million 2D images and 615,000 3D scans. On the RadBench benchmark, which spans five radiology task categories, RadFM outperforms prior accessible multimodal models including OpenFlamingo, MedFlamingo, MedVInT, and GPT-4V on both automatic metrics and human evaluation.

#Applications

RadFM targets radiologists, clinical researchers, and medical-AI developers who need a single model that reasons over heterogeneous imaging studies. Potential uses include drafting preliminary radiology reports, answering clinical questions grounded in a patient's scans, triaging across modalities, and serving as a pretrained backbone for fine-tuning on institution-specific tasks. Because it ingests 3D volumes directly, it is particularly relevant for CT and MRI workflows that earlier 2D-only vision-language models could not address.

#Impact

RadFM was among the first openly released generalist radiology foundation models to natively span 2D and 3D imaging with interleaved image-text prompting, and its publicly available MedMD/RadMD data and MIT-licensed weights have made it a common baseline and starting point for subsequent medical multimodal research. Its peer-reviewed publication in Nature Communications reflects broader adoption of the generalist, instruction-tuned paradigm in medical imaging. As with all such models, outputs are not clinically validated for autonomous diagnosis, performance varies across underrepresented modalities and populations, and human expert oversight remains essential before any clinical use.

Citation

Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data

Wu, C., et al. (2025) Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. Nature Communications.

DOI: 10.1038/s41467-025-62385-7

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations227
Influential18
References72

GitHub

Stars554
Forks66
Open Issues23
Contributors2
Last Push10mo ago
LanguagePython
LicenseMIT

HuggingFace

Downloads0
Likes20
Last Modified2y ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
84Open
Usability — can I run it?100
Reproducibility — can I retrain it?62
Model Openness Framework
Class III
Open Model

Tags

disease_diagnosisfoundation_modelgenerativemedical_imagingmultimodalradiologyreport_generationtransformervision_transformervisual_question_answering

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace ModelDataset