bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

EyeFound

The Hong Kong Polytechnic University / Sun Yat-sen University / National University of Singapore / EPFL

A multimodal generalist foundation model for ophthalmic imaging, self-supervised on 2.78M images across 11 modalities for diagnosis, prognosis, and visual question answering.

Released: May 2024

EyeFound is a multimodal generalist foundation model for ophthalmic imaging, developed to provide a single pretrained backbone that generalizes across the many imaging types used in eye care. Ophthalmology is unusually multimodal: clinicians routinely combine color fundus photographs, optical coherence tomography (OCT), fluorescein and indocyanine green angiography, ultra-widefield imaging, and several other modalities to diagnose and monitor disease. Most prior medical AI models target a single modality and a single task, limiting their reuse. EyeFound instead learns transferable representations from large volumes of unlabeled, heterogeneous ophthalmic images that can then be adapted, with modest labeled data, to a wide range of downstream applications.

The model was introduced in May 2024 by Danli Shi, Weiyi Zhang, Mingguang He and colleagues, led from the School of Optometry at The Hong Kong Polytechnic University, with collaborators at Sun Yat-sen University (Zhongshan Ophthalmic Center), the National University of Singapore, and EPFL. EyeFound builds directly on the lineage of retina-specific foundation models such as RETFound, which was trained primarily on color fundus and OCT images, by extending self-supervised pretraining across a far broader set of ophthalmic modalities.

By covering 11 imaging modalities in one model, EyeFound aims to serve as a shared starting point for ophthalmic AI development, reducing the need to train bespoke models for every imaging device and clinical question.

#Key Features

  • Multimodal coverage: Pretrained across 11 common ophthalmic imaging modalities, letting a single backbone support tasks that span fundus, OCT, angiography, and other image types rather than one modality at a time.
  • Self-supervised pretraining: Uses masked image modeling on unlabeled data, so the model learns from large image collections without requiring expert annotations for pretraining.
  • Generalist downstream adaptation: Fine-tunes efficiently to diverse tasks, including eye disease diagnosis, prediction of systemic disease events, and multimodal visual question answering.
  • Zero-shot VQA: Supports zero-shot multimodal visual question answering over ophthalmic images, pointing toward interactive, report-style clinical assistance.

#Technical Details

EyeFound uses a Masked Autoencoder (MAE) framework for self-supervised pretraining. The encoder is a Vision Transformer of ViT-Large scale (24 transformer blocks, embedding dimension 1,024) paired with a lightweight ViT-Small decoder (8 blocks, embedding dimension 512); pretraining masks roughly 80% of image patches and reconstructs them. The model was trained on 2.78 million retinal and ophthalmic images drawn from 227 hospitals, spanning 11 imaging modalities, with images preprocessed to 256×256 and augmented patches of 224×224. Pretraining ran for 50 epochs (15 warmup) with a peak learning rate of 1×10⁻³. Downstream adaptation uses parameter-efficient Low-Rank Adaptation (LoRA). Across reported evaluations, EyeFound outperformed RETFound on eye disease diagnosis and systemic disease prediction and demonstrated strong zero-shot multimodal VQA performance.

#Applications

EyeFound is intended as a reusable backbone for ophthalmic AI research and clinical decision support. Researchers can adapt it to classify and grade conditions such as diabetic retinopathy, glaucoma, and age-related macular degeneration; to predict the incidence of systemic diseases from retinal images (oculomics); and to build question-answering tools that interpret multimodal eye scans. Because it covers many modalities, it is especially useful in settings that operate heterogeneous imaging equipment, and it lowers the labeling burden for groups developing new diagnostic models on limited annotated cohorts.

#Impact

EyeFound contributes to the rapid expansion of medical imaging foundation models by demonstrating that a single self-supervised model can span the breadth of ophthalmic modalities rather than specializing in one. By extending the RETFound concept to 11 modalities and adding multimodal visual question answering, it helps chart a path toward generalist ophthalmic AI assistants. As a preprint, its results await peer review and independent external validation, and code and weights availability should be confirmed before clinical or production use; nonetheless, it is a notable reference point in the emerging landscape of multimodal medical foundation models.

Citation

EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

Preprint

Shi, D., et al. (2024) EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging. arXiv.org.

DOI: 10.48550/arXiv.2405.11338

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations40
Influential4
References17

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
4Closed
Usability — can I run it?7
Reproducibility — can I retrain it?0
not reproducible
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

disease_diagnosisfoundation_modelmasked_autoencodermultimodalophthalmologyprognosis_predictionretinal_imagingself_supervisedvision_transformervisual_question_answering

Resources

Research Paper