University of Electronic Science and Technology of China / Shanghai AI Laboratory / SenseTime / Sichuan University
A self-supervised foundation model for 3D medical image segmentation, pretrained on ~110k unannotated CT volumes via Volume Fusion.
MIS-FM addresses a central bottleneck in 3D medical image segmentation: deep networks for volumetric organ and structure delineation require large amounts of voxel-level annotation, which is expensive and time-consuming for radiologists to produce. The model demonstrates that powerful, transferable segmentation backbones can instead be pretrained on large pools of unannotated CT scans and then fine-tuned on small labeled downstream datasets, reducing the annotation burden while improving accuracy.
Introduced in June 2023 by researchers at the University of Electronic Science and Technology of China, Shanghai AI Laboratory, SenseTime Research, and Sichuan University, MIS-FM is built around a self-supervised pretext task called Volume Fusion (VolF). Rather than relying on contrastive learning or masked image modeling, VolF synthesizes pseudo-segmentation targets directly from unlabeled volumes, framing pretraining as a supervised-style segmentation problem without any manual labels.
MIS-FM sits within OpenMEDLab's family of medical foundation models and is notable for its focus on dense 3D prediction rather than classification or representation-only objectives, making the pretrained weights directly reusable as segmentation encoders.
PCT-Net uses a multi-scale feature embedding module followed by a pyramid of PCT blocks organized in an encoder-decoder structure, with channel widths of 24, 48, 128, 256, and 512 across five resolution levels. Pretraining optimizes the Volume Fusion objective over PData-110k. On downstream fine-tuning, the pretrained model consistently improves Dice scores over training from scratch: 82.74% vs. 81.41% on the MICCAI 2015 Head-Neck task, 89.56% vs. 87.58% on SegTHOR (thoracic organs), and 89.11% vs. 87.97% on the Synapse multi-organ abdominal benchmark. The authors report that VolF outperforms several state-of-the-art self-supervised pretraining methods across these tasks.
MIS-FM is aimed at medical imaging researchers and developers building 3D segmentation pipelines for CT, including delineation of head-and-neck organs at risk, thoracic and abdominal organs, and other volumetric structures. The released weights serve as a strong initialization for fine-tuning on small annotated datasets, which is valuable in radiotherapy planning, organ volumetry, and clinical research settings where labeled data is scarce.
By reframing self-supervised pretraining as a synthetic segmentation task, MIS-FM offered an alternative to contrastive and masked-reconstruction approaches that were dominant for 3D medical imaging at the time. As part of OpenMEDLab, its openly released, Apache-2.0 licensed weights and integration with PyMIC lowered the barrier to applying foundation-model pretraining in segmentation workflows. The work remains a reference point for label-efficient 3D medical image segmentation, though its pretraining is CT-specific and transfer to other modalities such as MRI is not the primary focus.
Wang, G., et al. (2023) MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset. arXiv.org.
DOI: 10.48550/arXiv.2306.16925Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data