Shanghai Jiao Tong University / University of Science and Technology of China / Shanghai AI Laboratory / Shanghai Sixth People's Hospital
Hierarchical knowledge-enhanced vision-language pre-training model for universal brain MRI diagnosis across 10+ diseases from multi-modal scans and reports.
UniBrain is a vision-language pre-training framework for universal brain MRI diagnosis, developed by researchers at Shanghai Jiao Tong University, the University of Science and Technology of China, Shanghai AI Laboratory, and Shanghai Sixth People's Hospital. First released as a preprint in September 2023 and later published in Computerized Medical Imaging and Graphics in 2025, it targets a central limitation of brain MRI deep learning: most models are trained narrowly on a single disease or modality and fail to generalize across the wide spectrum of conditions encountered in routine clinical practice.
Rather than relying on costly per-disease manual annotation, UniBrain learns directly from 24,770 routinely collected imaging-report pairs, pairing four transverse MRI modalities (T1WI, T2WI, T2FLAIR, and DWI) with their free-text radiology reports. The model addresses the gap between unstructured clinical prose and structured visual features through a hierarchical alignment scheme that links images and reports at multiple levels of granularity, enabling diagnosis across more than ten common brain diseases within a single framework.
The work sits within the broader trend of medical vision-language foundation models (alongside efforts like CheXzero and MedCLIP in chest imaging), but is distinctive in tackling volumetric, multi-modal brain MRI and in deriving diagnostic supervision automatically from radiology reports.
UniBrain combines a convolutional image encoder for volumetric MRI with a transformer-based text encoder, trained with a hierarchical contrastive vision-language objective over 24,770 imaging-report pairs drawn from routine diagnostics. The hierarchical alignment first matches modality-wise imaging-report features, then projects concatenated multi-modal features into a shared vision-language semantic space, and finally aligns global imaging-report representations. An Automatic Report Decomposition module structures the free-text reports into diagnosis-relevant knowledge used as supervision. On in-domain evaluation the model reports an average AUC of roughly 90.7% across its target diseases, and it is additionally validated on out-of-domain datasets, where it consistently surpasses prior state-of-the-art diagnostic baselines and reaches radiologist-level performance on certain disease categories.
UniBrain is aimed at automated brain MRI screening and decision support, where a single model can flag multiple candidate diagnoses from a standard multi-sequence study and its accompanying report. Because it learns from routinely generated reports rather than bespoke annotations, it is well suited to institutions with large unlabeled MRI archives, and the released inference SDK lets researchers and clinical informatics teams run predictions on NIfTI inputs across the four supported modalities. Its out-of-domain validation suggests utility as a transferable backbone for downstream neuroimaging tasks.
By demonstrating that report-supervised, hierarchically aligned pre-training can deliver universal, multi-disease brain MRI diagnosis at radiologist-comparable accuracy on some conditions, UniBrain contributes to the growing body of medical vision-language foundation models and offers a template for exploiting routine clinical reports as supervision. Its public weights and SDK lower the barrier for follow-on neuroimaging research. Key limitations include reliance on the four specific transverse modalities and on report quality, and validation confined to the disease set and cohorts represented in the training and test data, so generalization to rarer conditions and other scanner protocols remains to be established.
Lei, J., et al. (2025) UniBrain: Universal Brain MRI diagnosis with hierarchical knowledge-enhanced pre-training. Computerized Medical Imaging and Graphics.
DOI: 10.1016/j.compmedimag.2025.102516Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data