University of Stuttgart / German Research Center for Artificial Intelligence (DFKI) / Max Planck Institute for Informatics / University of Texas at Austin / University of Bonn / University of California, San Diego / National University of Singapore
Self-supervised vision foundation model pretrained on ~1.3M medical images via second-order graph matching, transferable across 15 medical imaging tasks.
LVM-Med is a large-scale, self-supervised vision foundation model for medical imaging, developed by a collaboration led by the University of Stuttgart and the German Research Center for Artificial Intelligence (DFKI), with co-authors from the Max Planck Institute for Informatics, University of Texas at Austin, University of Bonn, UC San Diego, and the National University of Singapore. It was introduced in the paper "LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching" and presented at NeurIPS 2023.
The model addresses a persistent gap in medical AI: general-purpose backbones pretrained on natural images (e.g., ImageNet) transfer poorly to medical modalities, while task-specific medical models fail to generalize across organs and imaging types. LVM-Med tackles this by assembling roughly 1.3 million medical images from 55 publicly available datasets spanning CT, MRI, X-ray, ultrasound, and other modalities, then pretraining a single backbone that can be fine-tuned for diverse downstream tasks.
Its core methodological contribution is reformulating self-supervised contrastive learning as a graph-matching problem. Rather than comparing only individual image pairs, LVM-Med builds graphs over samples and enforces structural consistency through a second-order, combinatorial graph-matching objective, capturing higher-order relationships that conventional contrastive objectives miss.
LVM-Med pretrains convolutional (ResNet-50) and transformer (ViT-B) backbones using its graph-matching contrastive objective. The matching formulation couples a similarity term over image embeddings with a combinatorial structural term; because the matching solver is non-differentiable, gradients are estimated through the black-box optimizer to enable end-to-end training. Reported results include a 2D segmentation Dice of 83.05 and 3D IoU of 79.02 for the ResNet-50 backbone, and a Dice of 85.80 and 3D IoU of 80.90 for ViT-B, consistently surpassing ImageNet-supervised and prior self-supervised pretraining. A ViT-H variant further trained on the LIVECell dataset and a Segment Anything Model (SAM) backbone are also provided for prompt-based segmentation. Code and weights are released under a CC BY-NC-ND license, with checkpoints distributed via the project repository.
LVM-Med serves as a drop-in pretrained backbone for medical image analysis, letting researchers and clinical-AI developers fine-tune a single model across segmentation (organs, tumors, cells in 2D and 3D), disease classification (e.g., brain tumor, diabetic retinopathy grading), and object detection (e.g., lesion detection on chest radiographs with Faster R-CNN). Because it is pretrained on heterogeneous modalities, it is particularly useful when labeled data is scarce, providing strong initialization that reduces the annotation burden for new imaging tasks.
LVM-Med demonstrated that domain-specific, large-scale self-supervised pretraining substantially outperforms transferring from natural-image models for medical imaging, and that incorporating higher-order structural relationships through graph matching improves representation quality. As one of the early openly released medical-imaging foundation backbones spanning many modalities and tasks, it has been widely cited and adopted as a baseline and starting point in subsequent medical foundation-model research. Its main limitations are the non-commercial CC BY-NC-ND license, which restricts downstream reuse, and a focus on 2D/3D vision tasks rather than multimodal (vision-language) understanding.
Nguyen, D., et al. (2023) LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching. Neural Information Processing Systems.
DOI: 10.48550/arXiv.2306.11925Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data