MIS-FM

University of Electronic Science and Technology of China / Shanghai AI Laboratory / SenseTime / Sichuan University

Self-supervised foundation model for 3D medical image segmentation, pretrained on roughly 110,000 unannotated CT volumes via Volume Fusion.

Released: June 2023

MIS-FM addresses a central bottleneck in 3D medical image segmentation: deep networks for volumetric organ and structure delineation require large amounts of voxel-level annotation, which is expensive and time-consuming for radiologists to produce. The model demonstrates that powerful, transferable segmentation backbones can instead be pretrained on large pools of unannotated CT scans and then fine-tuned on small labeled downstream datasets, reducing the annotation burden while improving accuracy.

Introduced in June 2023 by researchers at the University of Electronic Science and Technology of China, Shanghai AI Laboratory, SenseTime Research, and Sichuan University, MIS-FM is built around a self-supervised pretext task called Volume Fusion (VolF). Rather than relying on contrastive learning or masked image modeling, VolF synthesizes pseudo-segmentation targets directly from unlabeled volumes, framing pretraining as a supervised-style segmentation problem without any manual labels.

MIS-FM sits within OpenMEDLab's family of medical foundation models and is notable for its focus on dense 3D prediction rather than classification or representation-only objectives, making the pretrained weights directly reusable as segmentation encoders.

Key Features

Volume Fusion pretext task: VolF fuses random patches from a foreground sub-volume into a background sub-volume using discrete fusion coefficients, then trains the network to predict those coefficients as a voxel-wise classification target, creating a self-supervised segmentation task with no human annotation.
Hybrid PCT-Net architecture: The Parallel Convolution and Transformer Network combines convolutional branches with self-attention in parallel PCT blocks across a pyramid encoder-decoder, capturing both local texture and long-range context.
Large-scale CT pretraining: Pretrained on PData-110k, roughly 110,000 unannotated 3D CT volumes assembled from public datasets and a private collection of lung CT scans from nine hospitals with diverse imaging protocols.
Released pretrained weights: Apache-2.0 licensed checkpoints (including PCT-Net and an FMUNet variant) are distributed for transfer learning to new segmentation tasks via the PyMIC library.

Technical Details

PCT-Net uses a multi-scale feature embedding module followed by a pyramid of PCT blocks organized in an encoder-decoder structure, with channel widths of 24, 48, 128, 256, and 512 across five resolution levels. Pretraining optimizes the Volume Fusion objective over PData-110k. On downstream fine-tuning, the pretrained model consistently improves Dice scores over training from scratch: 82.74% vs. 81.41% on the MICCAI 2015 Head-Neck task, 89.56% vs. 87.58% on SegTHOR (thoracic organs), and 89.11% vs. 87.97% on the Synapse multi-organ abdominal benchmark. The authors report that VolF outperforms several state-of-the-art self-supervised pretraining methods across these tasks.

Applications

MIS-FM is aimed at medical imaging researchers and developers building 3D segmentation pipelines for CT, including delineation of head-and-neck organs at risk, thoracic and abdominal organs, and other volumetric structures. The released weights serve as a strong initialization for fine-tuning on small annotated datasets, which is valuable in radiotherapy planning, organ volumetry, and clinical research settings where labeled data is scarce.

Impact

By reframing self-supervised pretraining as a synthetic segmentation task, MIS-FM offered an alternative to contrastive and masked-reconstruction approaches that were dominant for 3D medical imaging at the time. As part of OpenMEDLab, its openly released, Apache-2.0 licensed weights and integration with PyMIC lowered the barrier to applying foundation-model pretraining in segmentation workflows. The work remains a reference point for label-efficient 3D medical image segmentation, though its pretraining is CT-specific and transfer to other modalities such as MRI is not the primary focus.

Citation

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset

Preprint

Wang, G., et al. (2023) MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset. arXiv.org.

DOI: 10.48550/arXiv.2306.16925

Recent citations

Papers that recently cited this model.

MorphologyFM: A Foundation Model for Morphology-Aware Representation Learning from ECG and Pulse Oximetry Waveforms
Saiyang Feng, Yuanyu Zhang, Shi Li
Jul 2026
0
Knowledge Transfer Scaling Laws for 3D Medical Imaging
Ho Hin Lee, D. Du, Chu Wang, et al.
May 2026
0
Foundation Model for Medical Imaging: A Comprehensive Review
Licheng Jiao, Jingyi Yang, Ruiyang Li, et al.
IEEE Transactions on Artificial Intelligence · May 2026
1

Top citations

The most-cited papers that cite this model.

Medical Image Analysis
Zongwei Zhou, V. Sodha, Jiaxuan Pang, et al.
458
CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
Jie Liu, Yixiao Zhang, Jieneng Chen, et al.
IEEE International Conference on Computer Vision · Jan 2023
351
On the Challenges and Perspectives of Foundation Models for Medical Image Analysis
Shaoting Zhang, Dimitris N. Metaxas
Medical Image Anal. · Jun 2023
304
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan, Seowung Leem, Kyle B. See, et al.
IEEE Reviews in Biomedical Engineering · Jun 2024
115
USFM: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis
Jing Jiao, Jin Zhou, Xiaokang Li, et al.
Medical Image Anal. · Dec 2023
100

Citations

Total Citations50

Influential1

References54

GitHub

Stars250

Forks9

Open Issues6

Contributors1

Last Push9mo ago

LanguagePython

LicenseApache-2.0

Fields of citing research

Computer Science98%
Medicine94%
Engineering42%
Environmental Science2%
Physics2%
Biology2%
Art2%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

73Open

Usability — can I run it?95

Reproducibility — can I retrain it?48

Model Openness Framework

Class III

Open Model

Resources

GitHub Repository Research Paper

Key Features

Volume Fusion pretext task: VolF fuses random patches from a foreground sub-volume into a background sub-volume using discrete fusion coefficients, then trains the network to predict those coefficients as a voxel-wise classification target, creating a self-supervised segmentation task with no human annotation.

Hybrid PCT-Net architecture: The Parallel Convolution and Transformer Network combines convolutional branches with self-attention in parallel PCT blocks across a pyramid encoder-decoder, capturing both local texture and long-range context.

Large-scale CT pretraining: Pretrained on PData-110k, roughly 110,000 unannotated 3D CT volumes assembled from public datasets and a private collection of lung CT scans from nine hospitals with diverse imaging protocols.

Released pretrained weights: Apache-2.0 licensed checkpoints (including PCT-Net and an FMUNet variant) are distributed for transfer learning to new segmentation tasks via the PyMIC library.

Technical Details

Applications

Impact

Citation

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset

Preprint

Wang, G., et al. (2023) MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset. arXiv.org.

DOI: 10.48550/arXiv.2306.16925

Recent citations

Papers that recently cited this model.

MorphologyFM: A Foundation Model for Morphology-Aware Representation Learning from ECG and Pulse Oximetry Waveforms

Saiyang Feng, Yuanyu Zhang, Shi Li

Jul 2026

Knowledge Transfer Scaling Laws for 3D Medical Imaging

Ho Hin Lee, D. Du, Chu Wang, et al.

May 2026

Foundation Model for Medical Imaging: A Comprehensive Review

Licheng Jiao, Jingyi Yang, Ruiyang Li, et al.

IEEE Transactions on Artificial Intelligence · May 2026

MIS-FM

#Key Features

#Technical Details

#Applications

#Impact

Citation

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset

Recent citations

MorphologyFM: A Foundation Model for Morphology-Aware Representation Learning from ECG and Pulse Oximetry Waveforms

Knowledge Transfer Scaling Laws for 3D Medical Imaging

Top citations

Medical Image Analysis

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MIS-FM

#Key Features

#Technical Details

#Applications

#Impact

Citation

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset

Recent citations

MorphologyFM: A Foundation Model for Morphology-Aware Representation Learning from ECG and Pulse Oximetry Waveforms

Knowledge Transfer Scaling Laws for 3D Medical Imaging

Top citations

Medical Image Analysis

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact