Southeast University / Western University / Case Western Reserve University
Self-supervised pre-training method for 3D medical images that embeds topological invariance into inter-image similarity to learn transferable representations.
Geometric Visual Similarity Learning (GVSL) is a self-supervised pre-training method for 3D medical images, introduced at CVPR 2023 by Yuting He, Guanyu Yang and colleagues at Southeast University, with collaborators at Western University and Case Western Reserve University. It targets a core obstacle in medical image representation learning: how to measure whether two unlabeled scans depict the same anatomical structures when there are no labels to anchor that comparison.
The central idea is that human anatomy is topologically stable across individuals — the same organs appear in roughly the same spatial relationships — so a good similarity measure for medical images should respect this topological invariance. GVSL embeds that prior directly into the similarity metric used during pre-training. Rather than treating two scans as a single global match or non-match, it learns correspondences between semantically equivalent regions, encouraging the network to cluster anatomically corresponding voxels even across different patients.
Because labeled 3D medical data is scarce and expensive to annotate, transferable pre-trained encoders are especially valuable in this domain. GVSL produces a reusable pretrained checkpoint that can be fine-tuned for downstream tasks such as segmentation and registration, positioning it alongside other self-supervised approaches for volumetric medical imaging while differentiating itself through its geometry-aware similarity formulation.
GVSL is built on convolutional encoder-decoder backbones standard in 3D medical image analysis and is implemented in PyTorch. The method combines image registration-style geometric alignment with similarity learning: the Z-matching head jointly optimizes global and local feature similarity so that the network learns representations consistent with the underlying anatomical topology. During pre-training, the model learns correspondences between semantically shared regions across volumes, improving what the authors describe as inner-scene, inter-scene, and global-local transferring ability. The original work evaluates the pretrained representations on four challenging 3D medical image tasks, reporting improved transfer performance over prior self-supervised pre-training baselines; the paper appears in the CVPR 2023 proceedings (pages 9538–9547).
GVSL is intended as a pre-training step for researchers and developers building 3D medical image analysis pipelines, particularly when labeled data is limited. The resulting encoder can be fine-tuned for downstream tasks including anatomical segmentation and image registration across modalities such as CT and MRI. By providing a strong initialization, it can reduce the volume of annotations needed to reach a target accuracy, benefiting medical imaging research groups and clinical AI developers who need data-efficient transfer learning.
GVSL contributed to the line of self-supervised pre-training research for volumetric medical imaging by reframing inter-image similarity through a geometry- and topology-aware lens rather than treating scans as holistic instances. Its acceptance at CVPR 2023 and the public release of code and pretrained weights have supported adoption and follow-on work in medical image self-supervised learning. As with many self-supervised methods, the practical benefit depends on the alignment between pre-training and downstream data distributions, and the released model zoo notes that pretrained parameters were being made available incrementally.
He, Y., et al. (2023) Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-training. Computer Vision and Pattern Recognition.
DOI: 10.1109/CVPR52729.2023.00920Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data