bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

M4oE

Hong Kong Baptist University / Johns Hopkins University

A Mixture-of-Experts foundation model for medical multimodal image segmentation that generalizes across imaging modalities and clinical centers.

Released: May 2024

M4oE (Medical Multimodal Mixture of Experts) is a foundation model for medical image segmentation designed to handle the heterogeneity that arises when imaging data are drawn from different modalities and different clinical centers. A persistent obstacle in medical imaging is that a model trained on one modality or one institution's acquisition protocol often degrades sharply when applied elsewhere, because anatomical appearance, contrast, and noise characteristics vary substantially across CT, MRI, and other scanners. M4oE tackles this by dedicating modality-specific experts that each capture domain knowledge for a particular data source, while a learned gating network dynamically weights their contributions.

The model was introduced by Yufeng Jiang (Hong Kong Baptist University) and Yiqing Shen (Johns Hopkins University) in a paper first posted to arXiv in May 2024 and subsequently accepted to MICCAI 2024, one of the principal venues for medical image computing. It sits within the recent wave of generalist medical segmentation models—alongside efforts such as STU-Net, MED3D, and SAM-Med2D—but distinguishes itself by using a Mixture-of-Experts (MoE) formulation to achieve cross-modality and cross-center generalization rather than relying on a single monolithic backbone.

By routing each input through the most relevant experts, M4oE aims to deliver strong segmentation accuracy across diverse datasets while keeping the active parameter footprint small, an attractive property for clinical deployment where compute and annotation budgets are constrained.

#Key Features

  • Modality-specific experts: Each expert is separately initialized to encode the domain knowledge of a particular imaging modality, allowing the model to absorb heterogeneous data without one modality overwriting another's learned features.
  • Dynamic gating network: A gating module modulates expert contributions on a per-input basis during fine-tuning and inference, so the model adaptively emphasizes the experts most relevant to the incoming scan and clinical center.
  • SwinUNet backbone: The architecture builds on the SwinUNet design, pairing a hierarchical Swin Transformer encoder with a U-Net-style decoder well suited to dense segmentation of anatomical structures.
  • Parameter and training efficiency: M4oE reaches competitive accuracy using roughly 30% of the parameter count of compared methods and cuts training time by about 7 hours, lowering the cost of adapting to new modalities.
  • Cross-center generalization: By treating clinical-center variability as a source of heterogeneity the experts learn to absorb, the framework is built to transfer across institutions rather than overfit to a single site.

#Technical Details

M4oE adopts a Mixture-of-Experts framework on top of a SwinUNet (Swin Transformer encoder with a U-Net decoder). Modality-specific experts are initialized independently to learn features that encode the domain characteristics of their respective modalities, and a gating network produces weights that combine expert outputs dynamically during fine-tuning. The authors evaluate the model across three modalities using three public abdominal and lesion segmentation datasets: FLARE22, AMOS2022, and ATLAS2023. On these benchmarks M4oE reports improvements of approximately 3.45% over STU-Net-L, 5.11% over MED3D, and 11.93% over SAM-Med2D, while using only about 30% of the parameters of comparison methods and reducing training duration by roughly 7 hours. The full codebase (architecture, training, and inference scripts) is public on GitHub, but the repository ships no LICENSE file, so the code is all-rights-reserved by default even though the paper itself is CC-BY-4.0. The only released weights are a third-party pretrained Swin Transformer initialization (linked via Google Drive)—not trained M4oE model weights—so users must train the model themselves, with the option to pretrain on custom datasets via masked autoencoding (MAE).

#Applications

M4oE targets multi-organ and lesion segmentation tasks in clinical and research radiology, where labeled data are scarce and acquisition protocols vary widely across hospitals. Its expert-routing design is well suited to settings that must process several imaging modalities—such as multi-phase abdominal CT and MRI—under a single deployable model. Researchers building generalist medical segmentation pipelines benefit from the reduced parameter and training cost, and clinical teams gain a framework intended to remain robust when moved between institutions without retraining a separate model for each site.

#Impact

As an MICCAI 2024 contribution, M4oE adds to the growing body of work showing that conditional computation and Mixture-of-Experts routing can address the modality and domain-shift problems that limit conventional medical segmentation networks. Its emphasis on efficiency—matching or exceeding larger baselines with a fraction of the parameters—reflects a broader push toward practical, deployable medical foundation models rather than parameter-heavy generalists. The model is relatively new and evaluated on three datasets, so its generalization to additional modalities and larger multi-center cohorts remains to be established. The public (though unlicensed) code repository lets others reproduce the architecture and training recipe for follow-up work extending the expert-routing approach to new clinical settings, though reusers should note that no trained M4oE weights are distributed and the code carries no open-source license.

Citation

M4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts

Jiang, Y. & Shen, Y. (2024) M4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts. International Conference on Medical Image Computing and Computer-Assisted Intervention.

DOI: 10.1007/978-3-031-72390-2_58

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations41
Influential1
References20

GitHub

Stars54
Forks2
Open Issues6
Contributors1
Last Push1y ago
LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
28Closed
Usability — can I run it?23
Reproducibility — can I retrain it?18
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

foundation_modelmedical_imagingmixture_of_expertssegmentationtransfer_learningvision_transformer

Resources

GitHub RepositoryResearch Paper