LVM-Med

University of Stuttgart / German Research Center for Artificial Intelligence (DFKI) / Max Planck Institute for Informatics / University of Texas at Austin / University of Bonn / University of California, San Diego / National University of Singapore

Self-supervised vision foundation model pretrained on 1.3M medical images via second-order graph matching, for segmentation and classification.

Released: June 2023

LVM-Med is a large-scale, self-supervised vision foundation model for medical imaging, developed by a collaboration led by the University of Stuttgart and the German Research Center for Artificial Intelligence (DFKI), with co-authors from the Max Planck Institute for Informatics, University of Texas at Austin, University of Bonn, UC San Diego, and the National University of Singapore. It was introduced in the paper "LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching" and presented at NeurIPS 2023.

The model addresses a persistent gap in medical AI: general-purpose backbones pretrained on natural images (e.g., ImageNet) transfer poorly to medical modalities, while task-specific medical models fail to generalize across organs and imaging types. LVM-Med tackles this by assembling roughly 1.3 million medical images from 55 publicly available datasets spanning CT, MRI, X-ray, ultrasound, and other modalities, then pretraining a single backbone that can be fine-tuned for diverse downstream tasks.

Its core methodological contribution is reformulating self-supervised contrastive learning as a graph-matching problem. Rather than comparing only individual image pairs, LVM-Med builds graphs over samples and enforces structural consistency through a second-order, combinatorial graph-matching objective, capturing higher-order relationships that conventional contrastive objectives miss.

Key Features

Second-order graph matching: Frames self-supervised pretraining as graph matching that integrates both pair-wise (local and global) image similarities and structural constraints via a combinatorial matching loss, trained end-to-end with gradient estimation through a black-box solver.
Large-scale medical pretraining corpus: Trained on ~1.3 million images aggregated from 55 public datasets covering multiple organs and modalities, yielding broadly transferable representations.
Released pretrained backbones: Provides ResNet-50 (~25.5M parameters) and ViT-B (~86M parameters) checkpoints, plus a SAM (ViT-B) variant for prompt-based segmentation experiments.
Broad task coverage: Validated on 15 downstream tasks including 2D/3D segmentation, image classification, and object detection, in both in-distribution and out-of-distribution settings.
Strong transfer performance: Outperforms supervised and self-supervised baselines, with reported gains of 6-7% over vision-language models on challenging tasks such as brain tumor classification and diabetic retinopathy grading.

Technical Details

LVM-Med pretrains convolutional (ResNet-50) and transformer (ViT-B) backbones using its graph-matching contrastive objective. The matching formulation couples a similarity term over image embeddings with a combinatorial structural term; because the matching solver is non-differentiable, gradients are estimated through the black-box optimizer to enable end-to-end training. Reported results include a 2D segmentation Dice of 83.05 and 3D IoU of 79.02 for the ResNet-50 backbone, and a Dice of 85.80 and 3D IoU of 80.90 for ViT-B, consistently surpassing ImageNet-supervised and prior self-supervised pretraining. A ViT-H variant further trained on the LIVECell dataset and a Segment Anything Model (SAM) backbone are also provided for prompt-based segmentation. Code and weights are released under a CC BY-NC-ND license, with checkpoints distributed via the project repository.

Applications

LVM-Med serves as a drop-in pretrained backbone for medical image analysis, letting researchers and clinical-AI developers fine-tune a single model across segmentation (organs, tumors, cells in 2D and 3D), disease classification (e.g., brain tumor, diabetic retinopathy grading), and object detection (e.g., lesion detection on chest radiographs with Faster R-CNN). Because it is pretrained on heterogeneous modalities, it is particularly useful when labeled data is scarce, providing strong initialization that reduces the annotation burden for new imaging tasks.

Impact

LVM-Med demonstrated that domain-specific, large-scale self-supervised pretraining substantially outperforms transferring from natural-image models for medical imaging, and that incorporating higher-order structural relationships through graph matching improves representation quality. As one of the early openly released medical-imaging foundation backbones spanning many modalities and tasks, it has been widely cited and adopted as a baseline and starting point in subsequent medical foundation-model research. Its main limitations are the non-commercial CC BY-NC-ND license, which restricts downstream reuse, and a focus on 2D/3D vision tasks rather than multimodal (vision-language) understanding.

Citation

LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Preprint

Nguyen, D., et al. (2023) LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching. Neural Information Processing Systems.

DOI: 10.48550/arXiv.2306.11925

Recent citations

Papers that recently cited this model.

Topology-Driven Transferability Estimation for 3D Medical Vision Foundation Models
Jiaqi Tang, Shaoyang Zhang, Fandong Zhang, et al.
Jul 2026
0
Machine Learning-Driven Combinatorial Optimization: A Systematic Review
Xinwei Wang, Xinyong Yu, Zhaopan Wang, et al.
Archives of Computational Methods in Engineering · Jun 2026
0
Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models
Yuting He, Chenyu You, Shuo Li
May 2026
0

Top citations

The most-cited papers that cite this model.

Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions
Yuting He, Fuxiang Huang, Xinrui Jiang, et al.
IEEE Reviews in Biomedical Engineering · Apr 2024
134
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision
Bobby Azad, Reza Azad, Sania Eskandari, et al.
arXiv.org · Oct 2023
125
Domain Generalization for Medical Image Analysis: A Review
Jee Seok Yoon, Kwanseok Oh, Yooseung Shin, et al.
Proceedings of the IEEE · Oct 2023
91
Foundation models and intelligent decision-making: Progress, challenges, and perspectives
Jincai Huang, Yongjun Xu, Qi Wang, et al.
Innovation (Cambridge (Mass.)) · May 2025
83
Vision-Language Models in medical image analysis: From simple fusion to general large models
Xiang Li, Like Li, Yuchen Jiang, et al.
Information Fusion · Feb 2025
68

Citations

Total Citations100

Influential12

References138

GitHub

Stars217

Forks28

Open Issues5

Contributors7

Last Push1y ago

LanguagePython

Fields of citing research

Computer Science98%
Medicine87%
Engineering30%
Biology2%
Mathematics1%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

28Closed

Usability — can I run it?22

Reproducibility — can I retrain it?22

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Second-order graph matching: Frames self-supervised pretraining as graph matching that integrates both pair-wise (local and global) image similarities and structural constraints via a combinatorial matching loss, trained end-to-end with gradient estimation through a black-box solver.

Large-scale medical pretraining corpus: Trained on ~1.3 million images aggregated from 55 public datasets covering multiple organs and modalities, yielding broadly transferable representations.

Released pretrained backbones: Provides ResNet-50 (~25.5M parameters) and ViT-B (~86M parameters) checkpoints, plus a SAM (ViT-B) variant for prompt-based segmentation experiments.

Broad task coverage: Validated on 15 downstream tasks including 2D/3D segmentation, image classification, and object detection, in both in-distribution and out-of-distribution settings.

Strong transfer performance: Outperforms supervised and self-supervised baselines, with reported gains of 6-7% over vision-language models on challenging tasks such as brain tumor classification and diabetic retinopathy grading.

Technical Details

Applications

Impact

Citation

LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Preprint

Nguyen, D., et al. (2023) LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching. Neural Information Processing Systems.

DOI: 10.48550/arXiv.2306.11925

Recent citations

Papers that recently cited this model.

Topology-Driven Transferability Estimation for 3D Medical Vision Foundation Models

Jiaqi Tang, Shaoyang Zhang, Fandong Zhang, et al.

Jul 2026

Machine Learning-Driven Combinatorial Optimization: A Systematic Review

Xinwei Wang, Xinyong Yu, Zhaopan Wang, et al.

Archives of Computational Methods in Engineering · Jun 2026

Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models

Yuting He, Chenyu You, Shuo Li

May 2026

LVM-Med

#Key Features

#Technical Details

#Applications

#Impact

Citation

LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Recent citations

Topology-Driven Transferability Estimation for 3D Medical Vision Foundation Models

Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

LVM-Med

#Key Features

#Technical Details

#Applications

#Impact

Citation

LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Recent citations

Topology-Driven Transferability Estimation for 3D Medical Vision Foundation Models

Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact