DenseFormer-MoE

Brain MRI foundation model pairing DenseNet and Vision Transformer backbones with mixture of experts for disease diagnosis and brain age prediction.

Released: October 2025

DenseFormer-MoE is a foundation model for structural brain MRI analysis that learns transferable representations from large collections of unlabeled T1-weighted scans and adapts them to several clinically relevant tasks. Developed by Rizhi Ding, Hui Lu, and Manhua Liu at the MoE Key Lab of Artificial Intelligence and AI Institute, Shanghai Jiao Tong University, it was published in IEEE Transactions on Medical Imaging in October 2025.

The central problem the model addresses is the scarcity of labeled neuroimaging data for individual brain disorders, which limits the performance of task-specific deep networks. By pretraining a single backbone in a self-supervised fashion and then sharing it across tasks, DenseFormer-MoE aims to deliver strong, generalizable features for diagnosing multiple brain diseases and for estimating brain age from a common representation. It joins a growing class of medical-imaging foundation models that move neuroimaging away from narrow, single-task pipelines toward reusable pretrained backbones.

Its key methodological contribution is twofold: a hybrid DenseFormer backbone that couples dense convolutional networks with Vision Transformers to capture both local and global structure, and a Mixture of Experts (MoE) head that dynamically routes each task to specialized experts to reduce the optimization conflicts that arise in multi-task learning.

Key Features

Hybrid DenseFormer backbone: Combines a DenseNet convolutional stem with a Vision Transformer so the model progressively consolidates fine-grained local texture and long-range global context from 3D brain MRI.
Masked Autoencoder pretraining: The backbone is pretrained with a Masked Autoencoder objective using self-supervised learning, improving the generalization of learned feature representations without requiring task labels.
Mixture of Experts routing: Task-specific experts are dynamically selected to address conflicting gradients across tasks, allowing a shared model to serve multiple diagnostic and predictive objectives.
Multi-task capability: A single pretrained model supports diagnosis of multiple brain disorders and continuous brain age estimation rather than a separate network per task.

Technical Details

The architecture integrates dense convolutional networks, a Vision Transformer, and a Mixture of Experts module operating on T1-weighted structural MRI. The backbone is first pretrained with a Masked Autoencoder, reconstructing masked image content to learn self-supervised representations, after which the MoE module dynamically assigns task-specific experts during multi-task fine-tuning. The authors evaluated the model on three large public neuroimaging cohorts: the UK Biobank, the Alzheimer's Disease Neuroimaging Initiative (ADNI), and the Parkinson's Progression Markers Initiative (PPMI). Downstream tasks include classification of Alzheimer's disease and mild cognitive impairment, Parkinson's disease diagnosis, and regression-based brain age prediction, with the paper reporting competitive performance relative to task-specific baselines.

Applications

DenseFormer-MoE is aimed at computational neuroimaging researchers and clinical AI developers who work with structural brain MRI. By providing a pretrained, multi-task backbone, it can support early screening and diagnosis of neurodegenerative conditions such as Alzheimer's and Parkinson's disease, staging of mild cognitive impairment, and brain-age estimation as a biomarker of accelerated aging. Because the representations are learned self-supervised, the model is particularly useful in settings where labeled scans for a specific disorder are limited, letting groups fine-tune a shared backbone rather than train each diagnostic model from scratch.

Impact

DenseFormer-MoE illustrates how foundation-model strategies—self-supervised pretraining plus a shared, reusable backbone—can be transferred from natural-image and language domains to 3D medical neuroimaging. Its pairing of a hybrid convolutional–transformer backbone with Mixture of Experts offers a concrete recipe for resolving multi-task conflicts in brain image analysis, a recurring challenge as the field consolidates many narrow models into general-purpose ones. Published in a leading medical-imaging venue and validated across UK Biobank, ADNI, and PPMI, it contributes to the emerging landscape of brain MRI foundation models. A notable limitation is the absence of a public code or weights release at publication, which constrains independent reproduction and downstream adoption.

Citation

DenseFormer-MoE: A Dense Transformer Foundation Model With Mixture of Experts for Multi-Task Brain Image Analysis

Ding, R., et al. (2025) DenseFormer-MoE: A Dense Transformer Foundation Model With Mixture of Experts for Multi-Task Brain Image Analysis. IEEE Transactions on Medical Imaging.

DOI: 10.1109/TMI.2025.3551514

Recent citations

Papers that recently cited this model.

Toward brain magnetic resonance imaging analysis intelligence: A review of federated learning and visual foundation models
Zhen Yu, Yang Liu, Qingchao Chen
Engineering applications of artificial intelligence · Aug 2026
0
Vision Foundation Models in Radiology: A Scoping Review of Data, Methodology, Evaluation and Clinical Translation
A. Vergara-Richart, Xavier Rafael-Palou, A. Fuster-Matanzo, et al.
Jul 2026
0Influential
MultiScaleSegNet: A novel framework for multi-modal brain tumor segmentation
Syed Fakhar Bilal, Jianqiang Li, Jun Qian, et al.
Biomedical Signal Processing and Control · Jun 2026
0

Top citations

The most-cited papers that cite this model.

Transformer attention-based neural network for cognitive score estimation from sMRI data
Songheng Li, Yanteng Zhang, Congyu Zou, et al.
Comput. Biol. Medicine · Jul 2025
6
A MoE-LLM-based multisensor flexible fusion fault diagnosis method for rotating machinery
Tantao Lin, Zhijun Ren, Kai Huang, et al.
Advanced Engineering Informatics · Jan 2026
4
EPSO-net: A multi-objective evolutionary neural architecture search with PSO-guided mutation fusion for explainable brain tumor segmentation
Farhana Yasmin, Yu Xue, M. Hasan, et al.
Information Fusion · Jan 2026
2
A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing
Chengkai Xu, Jiaqi Liu, Yicheng Guo, et al.
arXiv.org · Sep 2025
2
M$^3$AD: Multi-task Multi-gate Mixture of Experts for Alzheimer's Disease Diagnosis with Conversion Pattern Modeling
Yufeng Jiang, Hexiao Ding, Hongzhao Chen, et al.
Aug 2025
2

Citations

Total Citations27

Influential2

References45

Fields of citing research

Computer Science92%
Medicine69%
Engineering42%
Physics8%
Environmental Science8%
Psychology4%
Biology4%
Mathematics4%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

8Closed

Usability — can I run it?7

Reproducibility — can I retrain it?6

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper Research Paper

Key Features

Hybrid DenseFormer backbone: Combines a DenseNet convolutional stem with a Vision Transformer so the model progressively consolidates fine-grained local texture and long-range global context from 3D brain MRI.

Masked Autoencoder pretraining: The backbone is pretrained with a Masked Autoencoder objective using self-supervised learning, improving the generalization of learned feature representations without requiring task labels.

Mixture of Experts routing: Task-specific experts are dynamically selected to address conflicting gradients across tasks, allowing a shared model to serve multiple diagnostic and predictive objectives.

Multi-task capability: A single pretrained model supports diagnosis of multiple brain disorders and continuous brain age estimation rather than a separate network per task.

Technical Details

Applications

Impact

Citation

DenseFormer-MoE: A Dense Transformer Foundation Model With Mixture of Experts for Multi-Task Brain Image Analysis

Ding, R., et al. (2025) DenseFormer-MoE: A Dense Transformer Foundation Model With Mixture of Experts for Multi-Task Brain Image Analysis. IEEE Transactions on Medical Imaging.

DOI: 10.1109/TMI.2025.3551514

Recent citations

Papers that recently cited this model.

Toward brain magnetic resonance imaging analysis intelligence: A review of federated learning and visual foundation models

Zhen Yu, Yang Liu, Qingchao Chen

Engineering applications of artificial intelligence · Aug 2026

Vision Foundation Models in Radiology: A Scoping Review of Data, Methodology, Evaluation and Clinical Translation

A. Vergara-Richart, Xavier Rafael-Palou, A. Fuster-Matanzo, et al.

Jul 2026

0Influential

MultiScaleSegNet: A novel framework for multi-modal brain tumor segmentation

Syed Fakhar Bilal, Jianqiang Li, Jun Qian, et al.

Biomedical Signal Processing and Control · Jun 2026

DenseFormer-MoE

#Key Features

#Technical Details

#Applications

#Impact

Citation

DenseFormer-MoE: A Dense Transformer Foundation Model With Mixture of Experts for Multi-Task Brain Image Analysis

Recent citations

Vision Foundation Models in Radiology: A Scoping Review of Data, Methodology, Evaluation and Clinical Translation

Top citations

M$^3$AD: Multi-task Multi-gate Mixture of Experts for Alzheimer's Disease Diagnosis with Conversion Pattern Modeling

Related models

Citations

Fields of citing research

Openness

Tags

Resources

DenseFormer-MoE

#Key Features

#Technical Details

#Applications

#Impact

Citation

DenseFormer-MoE: A Dense Transformer Foundation Model With Mixture of Experts for Multi-Task Brain Image Analysis

Recent citations

Vision Foundation Models in Radiology: A Scoping Review of Data, Methodology, Evaluation and Clinical Translation

Top citations

M$^3$AD: Multi-task Multi-gate Mixture of Experts for Alzheimer's Disease Diagnosis with Conversion Pattern Modeling

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact