A self-supervised foundation model for structural brain MRI combining DenseNet and Vision Transformer with Mixture of Experts for multi-task brain disease diagnosis and brain age prediction.
DenseFormer-MoE is a foundation model for structural brain MRI analysis that learns transferable representations from large collections of unlabeled T1-weighted scans and adapts them to several clinically relevant tasks. Developed by Rizhi Ding, Hui Lu, and Manhua Liu at the MoE Key Lab of Artificial Intelligence and AI Institute, Shanghai Jiao Tong University, it was published in IEEE Transactions on Medical Imaging in October 2025.
The central problem the model addresses is the scarcity of labeled neuroimaging data for individual brain disorders, which limits the performance of task-specific deep networks. By pretraining a single backbone in a self-supervised fashion and then sharing it across tasks, DenseFormer-MoE aims to deliver strong, generalizable features for diagnosing multiple brain diseases and for estimating brain age from a common representation. It joins a growing class of medical-imaging foundation models that move neuroimaging away from narrow, single-task pipelines toward reusable pretrained backbones.
Its key methodological contribution is twofold: a hybrid DenseFormer backbone that couples dense convolutional networks with Vision Transformers to capture both local and global structure, and a Mixture of Experts (MoE) head that dynamically routes each task to specialized experts to reduce the optimization conflicts that arise in multi-task learning.
The architecture integrates dense convolutional networks, a Vision Transformer, and a Mixture of Experts module operating on T1-weighted structural MRI. The backbone is first pretrained with a Masked Autoencoder, reconstructing masked image content to learn self-supervised representations, after which the MoE module dynamically assigns task-specific experts during multi-task fine-tuning. The authors evaluated the model on three large public neuroimaging cohorts: the UK Biobank, the Alzheimer's Disease Neuroimaging Initiative (ADNI), and the Parkinson's Progression Markers Initiative (PPMI). Downstream tasks include classification of Alzheimer's disease and mild cognitive impairment, Parkinson's disease diagnosis, and regression-based brain age prediction, with the paper reporting competitive performance relative to task-specific baselines.
DenseFormer-MoE is aimed at computational neuroimaging researchers and clinical AI developers who work with structural brain MRI. By providing a pretrained, multi-task backbone, it can support early screening and diagnosis of neurodegenerative conditions such as Alzheimer's and Parkinson's disease, staging of mild cognitive impairment, and brain-age estimation as a biomarker of accelerated aging. Because the representations are learned self-supervised, the model is particularly useful in settings where labeled scans for a specific disorder are limited, letting groups fine-tune a shared backbone rather than train each diagnostic model from scratch.
DenseFormer-MoE illustrates how foundation-model strategies—self-supervised pretraining plus a shared, reusable backbone—can be transferred from natural-image and language domains to 3D medical neuroimaging. Its pairing of a hybrid convolutional–transformer backbone with Mixture of Experts offers a concrete recipe for resolving multi-task conflicts in brain image analysis, a recurring challenge as the field consolidates many narrow models into general-purpose ones. Published in a leading medical-imaging venue and validated across UK Biobank, ADNI, and PPMI, it contributes to the emerging landscape of brain MRI foundation models. A notable limitation is the absence of a public code or weights release at publication, which constrains independent reproduction and downstream adoption.
Ding, R., et al. (2025) DenseFormer-MoE: A Dense Transformer Foundation Model With Mixture of Experts for Multi-Task Brain Image Analysis. IEEE Transactions on Medical Imaging.
DOI: 10.1109/TMI.2025.3551514Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data