bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingPathology

VisionFM

Chinese University of Hong Kong

A multi-modal, multi-task vision foundation model for generalist ophthalmic AI, pretrained on 3.4M images from 560K+ individuals across 8 imaging modalities.

Released: October 2023

VisionFM is a multi-modal, multi-task vision foundation model built for generalist ophthalmic artificial intelligence. Rather than training a separate narrow model for each eye disease or imaging device, VisionFM learns broadly transferable representations of ocular tissue that can be adapted to a wide spectrum of downstream clinical tasks. It addresses a long-standing bottleneck in ophthalmic AI: most prior systems were single-task and single-modality, requiring large labeled datasets and costly retraining whenever a new disease, modality, or population was introduced.

Developed by the Advanced Biomedical Intelligence Lab (ABILab) at the Chinese University of Hong Kong (CUHK) together with a large collaborating clinical consortium, VisionFM was pretrained on 3.4 million ophthalmic images from 560,457 individuals, spanning a broad range of diseases, imaging devices, and demographics. The work was first released as a preprint in October 2023 and subsequently published in NEJM AI in 2024.

VisionFM sits alongside other retinal and ophthalmic foundation models such as RETFound, EyeFound, and EyeCLIP, but is distinguished by its breadth of imaging modalities and the diversity of clinical tasks it supports from a single pretrained backbone, including screening, diagnosis, prognosis, phenotype subclassification, and systemic biomarker prediction.

#Key Features

  • Multi-modal coverage: VisionFM provides modality-specific encoders for eight ophthalmic imaging types, including color fundus photography, optical coherence tomography (OCT), fundus fluorescein angiography (FFA), slit-lamp imaging, B-scan ultrasound, external eye imaging, MRI, and ultrasound biomicroscopy (UBM).
  • Multi-task generalization: A single pretrained foundation supports disease screening and diagnosis, prognosis, disease-phenotype subclassification, segmentation, landmark detection, and systemic biomarker and disease prediction.
  • Self-supervised pretraining: The model is pretrained without disease labels using a self-distillation approach, enabling it to learn from large unlabeled image corpora and transfer efficiently to labeled downstream tasks.
  • Synthetic data augmentation: The pretraining corpus is supplemented with generative synthetic ophthalmic images that passed visual Turing tests with practicing ophthalmologists, expanding data diversity.
  • Expert-level diagnosis: On 12 common eye diseases, VisionFM outperformed ophthalmologists at basic and intermediate experience levels in reported evaluations.

#Technical Details

VisionFM uses a Vision Transformer (ViT) backbone trained with a DINO-style self-supervised self-distillation objective, with separate encoders learned per imaging modality. The pretraining set comprises 3.4 million images from 560,457 individuals, augmented with synthetic data. Downstream task heads are attached and fine-tuned for classification, segmentation, and detection. Across large-scale benchmarks for diagnosis, segmentation, and detection, VisionFM outperformed baseline deep neural networks and demonstrated strong generalization to new modalities and previously unseen datasets. The official repository releases modality-specific pretrained weights for all eight modalities, fine-tuning code, fine-tuned weights on eight public multiclass disease-recognition datasets, and synthetic datasets; public downstream datasets such as IDRiD, OCTID, and DRIVE are supported, with private evaluation data available under a signed data-use agreement.

#Applications

VisionFM is designed for clinical and translational ophthalmology workflows where labeled data are scarce or where a unified system must handle many diseases and devices. Practical use cases include automated screening and triage for conditions such as diabetic retinopathy, glaucoma, and age-related macular degeneration; segmentation of retinal vessels and anatomical landmarks; disease prognosis; and the prediction of systemic biomarkers and diseases from ocular images. Researchers benefit from a pretrained backbone that can be fine-tuned on modest labeled datasets, lowering the barrier to building new ophthalmic AI applications.

#Impact

By demonstrating that a single self-supervised backbone can generalize across eight imaging modalities and a wide range of clinical tasks, VisionFM helped establish the foundation-model paradigm in ophthalmology. Its publication in NEJM AI and the public release of pretrained weights, fine-tuning code, and synthetic data have made it a reference point for subsequent ophthalmic foundation models and comparative studies. Its main limitations are a research-and-education-only license that precludes commercial use, and reliance on partly private evaluation data, which complicates fully independent reproduction of some reported results.

Citation

VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

Qiu, J., et al. (2023) VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence. NEJM AI.

DOI: 10.1056/AIoa2300221

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations53
Influential4
References48

GitHub

Stars126
Forks19
Open Issues6
Contributors1
Last Push11mo ago
LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
13Closed
Usability — can I run it?14
Reproducibility — can I retrain it?11
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

biomarker_predictiondisease_screeningfoundation_modelfundus_imagingoctsegmentationself_supervisedvision_transformer

Resources

GitHub RepositoryResearch PaperResearch Paper