M4oE

Hong Kong Baptist University / Johns Hopkins University

Mixture-of-Experts foundation model for medical image segmentation that generalizes across imaging modalities and clinical centers.

Released: May 2024

M4oE (Medical Multimodal Mixture of Experts) is a foundation model for medical image segmentation designed to handle the heterogeneity that arises when imaging data are drawn from different modalities and different clinical centers. A persistent obstacle in medical imaging is that a model trained on one modality or one institution's acquisition protocol often degrades sharply when applied elsewhere, because anatomical appearance, contrast, and noise characteristics vary substantially across CT, MRI, and other scanners. M4oE tackles this by dedicating modality-specific experts that each capture domain knowledge for a particular data source, while a learned gating network dynamically weights their contributions.

The model was introduced by Yufeng Jiang (Hong Kong Baptist University) and Yiqing Shen (Johns Hopkins University) in a paper first posted to arXiv in May 2024 and subsequently accepted to MICCAI 2024, one of the principal venues for medical image computing. It sits within the recent wave of generalist medical segmentation models—alongside efforts such as STU-Net, MED3D, and SAM-Med2D—but distinguishes itself by using a Mixture-of-Experts (MoE) formulation to achieve cross-modality and cross-center generalization rather than relying on a single monolithic backbone.

By routing each input through the most relevant experts, M4oE aims to deliver strong segmentation accuracy across diverse datasets while keeping the active parameter footprint small, an attractive property for clinical deployment where compute and annotation budgets are constrained.

Key Features

Modality-specific experts: Each expert is separately initialized to encode the domain knowledge of a particular imaging modality, allowing the model to absorb heterogeneous data without one modality overwriting another's learned features.
Dynamic gating network: A gating module modulates expert contributions on a per-input basis during fine-tuning and inference, so the model adaptively emphasizes the experts most relevant to the incoming scan and clinical center.
SwinUNet backbone: The architecture builds on the SwinUNet design, pairing a hierarchical Swin Transformer encoder with a U-Net-style decoder well suited to dense segmentation of anatomical structures.
Parameter and training efficiency: M4oE reaches competitive accuracy using roughly 30% of the parameter count of compared methods and cuts training time by about 7 hours, lowering the cost of adapting to new modalities.
Cross-center generalization: By treating clinical-center variability as a source of heterogeneity the experts learn to absorb, the framework is built to transfer across institutions rather than overfit to a single site.

Technical Details

M4oE adopts a Mixture-of-Experts framework on top of a SwinUNet (Swin Transformer encoder with a U-Net decoder). Modality-specific experts are initialized independently to learn features that encode the domain characteristics of their respective modalities, and a gating network produces weights that combine expert outputs dynamically during fine-tuning. The authors evaluate the model across three modalities using three public abdominal and lesion segmentation datasets: FLARE22, AMOS2022, and ATLAS2023. On these benchmarks M4oE reports improvements of approximately 3.45% over STU-Net-L, 5.11% over MED3D, and 11.93% over SAM-Med2D, while using only about 30% of the parameters of comparison methods and reducing training duration by roughly 7 hours. The full codebase (architecture, training, and inference scripts) is public on GitHub, but the repository ships no LICENSE file, so the code is all-rights-reserved by default even though the paper itself is CC-BY-4.0. The only released weights are a third-party pretrained Swin Transformer initialization (linked via Google Drive)—not trained M4oE model weights—so users must train the model themselves, with the option to pretrain on custom datasets via masked autoencoding (MAE).

Applications

M4oE targets multi-organ and lesion segmentation tasks in clinical and research radiology, where labeled data are scarce and acquisition protocols vary widely across hospitals. Its expert-routing design is well suited to settings that must process several imaging modalities—such as multi-phase abdominal CT and MRI—under a single deployable model. Researchers building generalist medical segmentation pipelines benefit from the reduced parameter and training cost, and clinical teams gain a framework intended to remain robust when moved between institutions without retraining a separate model for each site.

Impact

As an MICCAI 2024 contribution, M4oE adds to the growing body of work showing that conditional computation and Mixture-of-Experts routing can address the modality and domain-shift problems that limit conventional medical segmentation networks. Its emphasis on efficiency—matching or exceeding larger baselines with a fraction of the parameters—reflects a broader push toward practical, deployable medical foundation models rather than parameter-heavy generalists. The model is relatively new and evaluated on three datasets, so its generalization to additional modalities and larger multi-center cohorts remains to be established. The public (though unlicensed) code repository lets others reproduce the architecture and training recipe for follow-up work extending the expert-routing approach to new clinical settings, though reusers should note that no trained M4oE weights are distributed and the code carries no open-source license.

Citation

M4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts

Jiang, Y. & Shen, Y. (2024) M4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts. International Conference on Medical Image Computing and Computer-Assisted Intervention.

DOI: 10.1007/978-3-031-72390-2_58

Recent citations

Papers that recently cited this model.

On the effectiveness of MoE-enhanced transformer for accurate and generalizable mask-based semantic segmentation
Dahye Jung, Youji Sohn, Sang In Lee, et al.
Machine Vision and Applications · Jun 2026
0
Mixture of experts for radiology report generation
Xiangkang Song, Zhi Liu, Xiaodi Hou, et al.
Engineering applications of artificial intelligence · Jun 2026
0
Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey
L. Zheng, Wei Zhang, Olaf Maennel, et al.
May 2026
0

Top citations

The most-cited papers that cite this model.

Multimodal Large Language Models in Medical Imaging: Current State and Future Directions
Yoojin Nam, Dong Yeong Kim, Sunggu Kyung, et al.
Korean Journal of Radiology · Aug 2025
60
Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions
Kai Sun, Siyan Xue, Fuchun Sun, et al.
Artif. Intell. Medicine · Dec 2024
39
Mixture of Experts (MoE): A Big Data Perspective
Wensheng Gan, Zhenyao Ning, Zhenlian Qi, et al.
Information Fusion · Jan 2025
36
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
Xiang Xu, Lingdong Kong, Hui Shuai, et al.
Computer Vision and Pattern Recognition · Jan 2025
24
CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
Wei Dai, Peilin Chen, Malinda Lu, et al.
International Conference on Machine Learning · Mar 2025
17

Citations

Total Citations42

Influential1

References20

GitHub

Stars55

Forks2

Open Issues6

Contributors1

Last Push1y ago

LanguagePython

Fields of citing research

Computer Science93%
Medicine74%
Engineering43%
Environmental Science5%
Biology2%
Education2%
Psychology2%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

28Closed

Usability — can I run it?23

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Modality-specific experts: Each expert is separately initialized to encode the domain knowledge of a particular imaging modality, allowing the model to absorb heterogeneous data without one modality overwriting another's learned features.

Dynamic gating network: A gating module modulates expert contributions on a per-input basis during fine-tuning and inference, so the model adaptively emphasizes the experts most relevant to the incoming scan and clinical center.

SwinUNet backbone: The architecture builds on the SwinUNet design, pairing a hierarchical Swin Transformer encoder with a U-Net-style decoder well suited to dense segmentation of anatomical structures.

Parameter and training efficiency: M4oE reaches competitive accuracy using roughly 30% of the parameter count of compared methods and cuts training time by about 7 hours, lowering the cost of adapting to new modalities.

Cross-center generalization: By treating clinical-center variability as a source of heterogeneity the experts learn to absorb, the framework is built to transfer across institutions rather than overfit to a single site.

Technical Details

Applications

Impact

Citation

M4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts

DOI: 10.1007/978-3-031-72390-2_58

Recent citations

Papers that recently cited this model.

On the effectiveness of MoE-enhanced transformer for accurate and generalizable mask-based semantic segmentation

Dahye Jung, Youji Sohn, Sang In Lee, et al.

Machine Vision and Applications · Jun 2026

Mixture of experts for radiology report generation

Xiangkang Song, Zhi Liu, Xiaodi Hou, et al.

Engineering applications of artificial intelligence · Jun 2026

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

L. Zheng, Wei Zhang, Olaf Maennel, et al.

May 2026

M4oE

#Key Features

#Technical Details

#Applications

#Impact

Citation

M4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts

Recent citations

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

M4oE

#Key Features

#Technical Details

#Applications

#Impact

Citation

M4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts

Recent citations

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact