bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

MIS-FM

University of Electronic Science and Technology of China / Shanghai AI Laboratory / SenseTime / Sichuan University

A self-supervised foundation model for 3D medical image segmentation, pretrained on ~110k unannotated CT volumes via Volume Fusion.

Released: June 2023

MIS-FM addresses a central bottleneck in 3D medical image segmentation: deep networks for volumetric organ and structure delineation require large amounts of voxel-level annotation, which is expensive and time-consuming for radiologists to produce. The model demonstrates that powerful, transferable segmentation backbones can instead be pretrained on large pools of unannotated CT scans and then fine-tuned on small labeled downstream datasets, reducing the annotation burden while improving accuracy.

Introduced in June 2023 by researchers at the University of Electronic Science and Technology of China, Shanghai AI Laboratory, SenseTime Research, and Sichuan University, MIS-FM is built around a self-supervised pretext task called Volume Fusion (VolF). Rather than relying on contrastive learning or masked image modeling, VolF synthesizes pseudo-segmentation targets directly from unlabeled volumes, framing pretraining as a supervised-style segmentation problem without any manual labels.

MIS-FM sits within OpenMEDLab's family of medical foundation models and is notable for its focus on dense 3D prediction rather than classification or representation-only objectives, making the pretrained weights directly reusable as segmentation encoders.

#Key Features

  • Volume Fusion pretext task: VolF fuses random patches from a foreground sub-volume into a background sub-volume using discrete fusion coefficients, then trains the network to predict those coefficients as a voxel-wise classification target, creating a self-supervised segmentation task with no human annotation.
  • Hybrid PCT-Net architecture: The Parallel Convolution and Transformer Network combines convolutional branches with self-attention in parallel PCT blocks across a pyramid encoder-decoder, capturing both local texture and long-range context.
  • Large-scale CT pretraining: Pretrained on PData-110k, roughly 110,000 unannotated 3D CT volumes assembled from public datasets and a private collection of lung CT scans from nine hospitals with diverse imaging protocols.
  • Released pretrained weights: Apache-2.0 licensed checkpoints (including PCT-Net and an FMUNet variant) are distributed for transfer learning to new segmentation tasks via the PyMIC library.

#Technical Details

PCT-Net uses a multi-scale feature embedding module followed by a pyramid of PCT blocks organized in an encoder-decoder structure, with channel widths of 24, 48, 128, 256, and 512 across five resolution levels. Pretraining optimizes the Volume Fusion objective over PData-110k. On downstream fine-tuning, the pretrained model consistently improves Dice scores over training from scratch: 82.74% vs. 81.41% on the MICCAI 2015 Head-Neck task, 89.56% vs. 87.58% on SegTHOR (thoracic organs), and 89.11% vs. 87.97% on the Synapse multi-organ abdominal benchmark. The authors report that VolF outperforms several state-of-the-art self-supervised pretraining methods across these tasks.

#Applications

MIS-FM is aimed at medical imaging researchers and developers building 3D segmentation pipelines for CT, including delineation of head-and-neck organs at risk, thoracic and abdominal organs, and other volumetric structures. The released weights serve as a strong initialization for fine-tuning on small annotated datasets, which is valuable in radiotherapy planning, organ volumetry, and clinical research settings where labeled data is scarce.

#Impact

By reframing self-supervised pretraining as a synthetic segmentation task, MIS-FM offered an alternative to contrastive and masked-reconstruction approaches that were dominant for 3D medical imaging at the time. As part of OpenMEDLab, its openly released, Apache-2.0 licensed weights and integration with PyMIC lowered the barrier to applying foundation-model pretraining in segmentation workflows. The work remains a reference point for label-efficient 3D medical image segmentation, though its pretraining is CT-specific and transfer to other modalities such as MRI is not the primary focus.

Citation

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset

Preprint

Wang, G., et al. (2023) MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset. arXiv.org.

DOI: 10.48550/arXiv.2306.16925

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations48
Influential2
References54

GitHub

Stars247
Forks8
Open Issues6
Contributors1
Last Push7mo ago
LanguagePython
LicenseApache-2.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
73Open
Usability — can I run it?95
Reproducibility — can I retrain it?48
Model Openness Framework
Class III
Open Model

Tags

cnnctfoundation_modelsegmentationself_supervisedvision_transformer

Resources

GitHub RepositoryResearch Paper