bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

LVM-Med

University of Stuttgart / German Research Center for Artificial Intelligence (DFKI) / Max Planck Institute for Informatics / University of Texas at Austin / University of Bonn / University of California, San Diego / National University of Singapore

Self-supervised vision foundation model pretrained on ~1.3M medical images via second-order graph matching, transferable across 15 medical imaging tasks.

Released: June 2023

LVM-Med is a large-scale, self-supervised vision foundation model for medical imaging, developed by a collaboration led by the University of Stuttgart and the German Research Center for Artificial Intelligence (DFKI), with co-authors from the Max Planck Institute for Informatics, University of Texas at Austin, University of Bonn, UC San Diego, and the National University of Singapore. It was introduced in the paper "LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching" and presented at NeurIPS 2023.

The model addresses a persistent gap in medical AI: general-purpose backbones pretrained on natural images (e.g., ImageNet) transfer poorly to medical modalities, while task-specific medical models fail to generalize across organs and imaging types. LVM-Med tackles this by assembling roughly 1.3 million medical images from 55 publicly available datasets spanning CT, MRI, X-ray, ultrasound, and other modalities, then pretraining a single backbone that can be fine-tuned for diverse downstream tasks.

Its core methodological contribution is reformulating self-supervised contrastive learning as a graph-matching problem. Rather than comparing only individual image pairs, LVM-Med builds graphs over samples and enforces structural consistency through a second-order, combinatorial graph-matching objective, capturing higher-order relationships that conventional contrastive objectives miss.

#Key Features

  • Second-order graph matching: Frames self-supervised pretraining as graph matching that integrates both pair-wise (local and global) image similarities and structural constraints via a combinatorial matching loss, trained end-to-end with gradient estimation through a black-box solver.
  • Large-scale medical pretraining corpus: Trained on ~1.3 million images aggregated from 55 public datasets covering multiple organs and modalities, yielding broadly transferable representations.
  • Released pretrained backbones: Provides ResNet-50 (~25.5M parameters) and ViT-B (~86M parameters) checkpoints, plus a SAM (ViT-B) variant for prompt-based segmentation experiments.
  • Broad task coverage: Validated on 15 downstream tasks including 2D/3D segmentation, image classification, and object detection, in both in-distribution and out-of-distribution settings.
  • Strong transfer performance: Outperforms supervised and self-supervised baselines, with reported gains of 6-7% over vision-language models on challenging tasks such as brain tumor classification and diabetic retinopathy grading.

#Technical Details

LVM-Med pretrains convolutional (ResNet-50) and transformer (ViT-B) backbones using its graph-matching contrastive objective. The matching formulation couples a similarity term over image embeddings with a combinatorial structural term; because the matching solver is non-differentiable, gradients are estimated through the black-box optimizer to enable end-to-end training. Reported results include a 2D segmentation Dice of 83.05 and 3D IoU of 79.02 for the ResNet-50 backbone, and a Dice of 85.80 and 3D IoU of 80.90 for ViT-B, consistently surpassing ImageNet-supervised and prior self-supervised pretraining. A ViT-H variant further trained on the LIVECell dataset and a Segment Anything Model (SAM) backbone are also provided for prompt-based segmentation. Code and weights are released under a CC BY-NC-ND license, with checkpoints distributed via the project repository.

#Applications

LVM-Med serves as a drop-in pretrained backbone for medical image analysis, letting researchers and clinical-AI developers fine-tune a single model across segmentation (organs, tumors, cells in 2D and 3D), disease classification (e.g., brain tumor, diabetic retinopathy grading), and object detection (e.g., lesion detection on chest radiographs with Faster R-CNN). Because it is pretrained on heterogeneous modalities, it is particularly useful when labeled data is scarce, providing strong initialization that reduces the annotation burden for new imaging tasks.

#Impact

LVM-Med demonstrated that domain-specific, large-scale self-supervised pretraining substantially outperforms transferring from natural-image models for medical imaging, and that incorporating higher-order structural relationships through graph matching improves representation quality. As one of the early openly released medical-imaging foundation backbones spanning many modalities and tasks, it has been widely cited and adopted as a baseline and starting point in subsequent medical foundation-model research. Its main limitations are the non-commercial CC BY-NC-ND license, which restricts downstream reuse, and a focus on 2D/3D vision tasks rather than multimodal (vision-language) understanding.

Citation

LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Preprint

Nguyen, D., et al. (2023) LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching. Neural Information Processing Systems.

DOI: 10.48550/arXiv.2306.11925

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations96
Influential10
References138

GitHub

Stars217
Forks28
Open Issues5
Contributors7
Last Push1y ago
LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
28Closed
Usability — can I run it?22
Reproducibility — can I retrain it?22
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

classificationcontrastive_learningctfoundation_modelgraph_neural_networkmriobject_detectionradiologyrepresentation_learningresnetsegmentationself_supervisedtransfer_learningvision_transformer

Resources

GitHub RepositoryResearch Paper