A universal ultrasound foundation model pretrained on 2M+ multi-organ images via spatial-frequency masked image modeling, enabling label-efficient segmentation, classification, and enhancement.
USFM (Universal Ultrasound Foundation Model) is a self-supervised foundation model for medical ultrasound image analysis, developed by the Laboratory of Medical Imaging and Artificial Intelligence at Fudan University and published in Medical Image Analysis in 2024. Ultrasound is among the most widely used clinical imaging modalities, but deep-learning models for it have historically been narrow: trained organ-by-organ and task-by-task, requiring large annotated datasets that are costly to acquire because ultrasound interpretation depends on specialist expertise. USFM aims to break this bottleneck with a single pretrained backbone that transfers across organs, diseases, and task types.
The model addresses two challenges that make ultrasound harder to model than natural images or other medical scans. First, ultrasound images are noisy and low-contrast, with speckle and operator-dependent acquisition that obscure anatomical structure. Second, pretraining data must span many organs and devices to generalize. To handle these, USFM is pretrained on a large multi-organ, multi-center, multi-device database of over two million ultrasound images using a novel spatial-frequency dual masked image modeling objective designed to learn robust features despite degraded image quality.
By learning general-purpose ultrasound representations once and fine-tuning them on small labeled datasets, USFM positions itself as a label-efficient backbone for the full spectrum of downstream ultrasound tasks rather than a single-purpose classifier or segmenter.
USFM uses a Vision Transformer (ViT) backbone pretrained with a self-supervised
spatial-frequency dual masked image modeling scheme. The spatial branch follows
masked image modeling with a noise addition-and-recovery formulation suited to
ultrasound speckle, while the frequency branch applies band-stop masking so the
model must recover suppressed frequency components. Pretraining draws on a
curated database of more than two million ultrasound images spanning multiple
organs, clinical centers, and ultrasound devices, with organ-balanced sampling to
promote generalizability. For downstream evaluation, the pretrained encoder is
paired with standard task heads (for example SegViT or UperNet for segmentation),
and the authors report that USFM matches or exceeds competing approaches across
segmentation, classification, and image-enhancement benchmarks while using
substantially fewer labeled examples and fewer training epochs. Released weights
(USFM_latest.pth) are distributed under a CC-BY-NC 4.0 license.
USFM serves as a transferable backbone for clinical and research ultrasound analysis: segmenting lesions and anatomical structures, classifying disease (such as benign-versus-malignant assessment), and enhancing low-quality scans. Researchers and clinical AI developers benefit most, because the pretrained model lets them build accurate task-specific systems from small annotated datasets, lowering the barrier for ultrasound applications across organs—breast, thyroid, liver, cardiac, obstetric, and others—where assembling large expert-labeled corpora is impractical.
USFM is among the first general-purpose foundation models targeted specifically at medical ultrasound, a modality long underserved relative to CT, MRI, and histopathology in foundation-model research. By demonstrating that a single self-supervised backbone can generalize across organs and tasks while cutting annotation requirements, it provides a practical template for label-efficient ultrasound AI and has been incorporated into the OpenMedLab ecosystem of open medical foundation models. Its main limitations are those common to the class: the pretraining corpus, while large and diverse, is not fully described publicly, the weights are restricted to non-commercial use, and downstream performance still depends on quality fine-tuning data for each clinical target.
Jiao, J., et al. (2023) USFM: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Medical Image Anal..
DOI: 10.1016/j.media.2024.103202Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data