GVSL (Geometric Visual Similarity Learning)

Southeast University / Western University / Case Western Reserve University

Self-supervised pretraining for 3D medical images that learns anatomical correspondences between scans, giving encoders transferable to segmentation.

Released: March 2023

Geometric Visual Similarity Learning (GVSL) is a self-supervised pre-training method for 3D medical images, introduced at CVPR 2023 by Yuting He, Guanyu Yang and colleagues at Southeast University, with collaborators at Western University and Case Western Reserve University. It targets a core obstacle in medical image representation learning: how to measure whether two unlabeled scans depict the same anatomical structures when there are no labels to anchor that comparison.

The central idea is that human anatomy is topologically stable across individuals — the same organs appear in roughly the same spatial relationships — so a good similarity measure for medical images should respect this topological invariance. GVSL embeds that prior directly into the similarity metric used during pre-training. Rather than treating two scans as a single global match or non-match, it learns correspondences between semantically equivalent regions, encouraging the network to cluster anatomically corresponding voxels even across different patients.

Because labeled 3D medical data is scarce and expensive to annotate, transferable pre-trained encoders are especially valuable in this domain. GVSL produces a reusable pretrained checkpoint that can be fine-tuned for downstream tasks such as segmentation and registration, positioning it alongside other self-supervised approaches for volumetric medical imaging while differentiating itself through its geometry-aware similarity formulation.

Key Features

Topological invariance prior: Encodes the assumption that anatomical structures are consistently arranged across individuals, embedding this geometric prior into the inter-image similarity measurement used for self-supervision.
Z-matching head: A geometric matching head that collaboratively learns both global and local semantic similarity, capturing whole-scene context and fine-grained regional correspondence in a single objective.
Label-free pre-training: Learns from unlabeled 3D scans, addressing the scarcity of annotated volumetric medical data and reducing reliance on costly expert labeling.
Transferable checkpoint: Produces a reusable pretrained encoder that improves downstream transfer across multiple 3D medical image tasks, released through a pre-trained model zoo in the official repository.

Technical Details

GVSL is built on convolutional encoder-decoder backbones standard in 3D medical image analysis and is implemented in PyTorch. The method combines image registration-style geometric alignment with similarity learning: the Z-matching head jointly optimizes global and local feature similarity so that the network learns representations consistent with the underlying anatomical topology. During pre-training, the model learns correspondences between semantically shared regions across volumes, improving what the authors describe as inner-scene, inter-scene, and global-local transferring ability. The original work evaluates the pretrained representations on four challenging 3D medical image tasks, reporting improved transfer performance over prior self-supervised pre-training baselines; the paper appears in the CVPR 2023 proceedings (pages 9538–9547).

Applications

GVSL is intended as a pre-training step for researchers and developers building 3D medical image analysis pipelines, particularly when labeled data is limited. The resulting encoder can be fine-tuned for downstream tasks including anatomical segmentation and image registration across modalities such as CT and MRI. By providing a strong initialization, it can reduce the volume of annotations needed to reach a target accuracy, benefiting medical imaging research groups and clinical AI developers who need data-efficient transfer learning.

Impact

GVSL contributed to the line of self-supervised pre-training research for volumetric medical imaging by reframing inter-image similarity through a geometry- and topology-aware lens rather than treating scans as holistic instances. Its acceptance at CVPR 2023 and the public release of code and pretrained weights have supported adoption and follow-on work in medical image self-supervised learning. As with many self-supervised methods, the practical benefit depends on the alignment between pre-training and downstream data distributions, and the released model zoo notes that pretrained parameters were being made available incrementally.

Citation

Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-training

He, Y., et al. (2023) Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-training. Computer Vision and Pattern Recognition.

DOI: 10.1109/CVPR52729.2023.00920

Recent citations

Papers that recently cited this model.

Visualization methods for explainable medical imaging diagnosis: A survey
Longzhen Yang, Yihang Liu, Lianghua He, et al.
Computer Science Review · 2026
0
Nexus: Neuro-guided expert-routed pre-training for brain representation learning from sMRI
Hu Yu, Yiyu Zhang, Si-Yue Fu, et al.
Expert systems with applications · Jun 2026
0
ASAP: Advancing Medical Volumetric Representation Learning with Anatomy-aware Semantically-adaptive Pre-training
Rongsheng Wang, Fenghe Tang, Zihang Jiang, et al.
May 2026
0

Top citations

The most-cited papers that cite this model.

Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions
Yuting He, Fuxiang Huang, Xinrui Jiang, et al.
IEEE Reviews in Biomedical Engineering · Apr 2024
134Influential
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang, Junkai Chen, Beier Zhu, et al.
Computer Vision and Pattern Recognition · Nov 2024
121
VoCo: A Simple-Yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Linshan Wu, Jiaxin Zhuang, Hao Chen
Computer Vision and Pattern Recognition · Feb 2024
109Influential
UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner
Yiwen Ye, Yutong Xie, Jianpeng Zhang, et al.
International Conference on Medical Image Computing and Computer-Assisted Intervention · Apr 2023
75
Large-Scale 3D Medical Image Pre-Training With Geometric Context Priors
Linshan Wu, Jiaxin Zhuang, Hao Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence · Oct 2024
37Influential

Citations

Total Citations64

Influential8

References59

GitHub

Stars70

Forks5

Open Issues3

Contributors0

Last Push2y ago

LanguagePython

Fields of citing research

Computer Science100%
Medicine87%
Engineering31%
Psychology3%
Environmental Science2%
Business2%
Biology2%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

17Closed

Usability — can I run it?22

Reproducibility — can I retrain it?4

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Topological invariance prior: Encodes the assumption that anatomical structures are consistently arranged across individuals, embedding this geometric prior into the inter-image similarity measurement used for self-supervision.

Z-matching head: A geometric matching head that collaboratively learns both global and local semantic similarity, capturing whole-scene context and fine-grained regional correspondence in a single objective.

Label-free pre-training: Learns from unlabeled 3D scans, addressing the scarcity of annotated volumetric medical data and reducing reliance on costly expert labeling.

Transferable checkpoint: Produces a reusable pretrained encoder that improves downstream transfer across multiple 3D medical image tasks, released through a pre-trained model zoo in the official repository.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Visualization methods for explainable medical imaging diagnosis: A survey

Longzhen Yang, Yihang Liu, Lianghua He, et al.

Computer Science Review · 2026

Nexus: Neuro-guided expert-routed pre-training for brain representation learning from sMRI

Hu Yu, Yiyu Zhang, Si-Yue Fu, et al.

Expert systems with applications · Jun 2026

ASAP: Advancing Medical Volumetric Representation Learning with Anatomy-aware Semantically-adaptive Pre-training

Rongsheng Wang, Fenghe Tang, Zihang Jiang, et al.

May 2026

GVSL (Geometric Visual Similarity Learning)

#Key Features

#Technical Details

#Applications

#Impact

Citation

Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-training

Recent citations

ASAP: Advancing Medical Volumetric Representation Learning with Anatomy-aware Semantically-adaptive Pre-training

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

GVSL (Geometric Visual Similarity Learning)

#Key Features

#Technical Details

#Applications

#Impact

Citation

Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-training

Recent citations

ASAP: Advancing Medical Volumetric Representation Learning with Anatomy-aware Semantically-adaptive Pre-training

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact