A parameter-efficient (5.6M) multimodal foundation model that fuses fMRI time series with diffusion-MRI structural connectivity for brain dynamics analysis.
BrainSymphony is a multimodal foundation model for human brain dynamics that jointly represents functional MRI (fMRI) time series and diffusion-MRI-derived structural connectivity in a unified region-of-interest (ROI) embedding space. It was developed by Moein Khajehnejad, Forough Habibollahi, Devon Stoliker, and Adeel Razi at Monash University's Turner Institute for Brain and Mental Health, with the preprint first posted to arXiv in June 2025.
The central problem the model addresses is the data and compute cost of existing brain foundation models. Whereas systems such as BrainLM (111M parameters) and Brain-JEPA (85M parameters) require large pretraining corpora, BrainSymphony is deliberately lightweight, totalling roughly 5.6 million parameters, and is designed to perform competitively when labelled neuroimaging data are scarce. It offers plug-and-play integration of functional and structural modalities: it can be trained and deployed in unimodal or multimodal form without architectural changes, making it adaptable to datasets that contain only one imaging type.
By combining temporal brain signals with anatomical wiring, BrainSymphony sits at the intersection of medical imaging and biosignal modelling, contributing to a growing class of neuro foundation models aimed at decoding individual differences, cognitive states, and clinical phenotypes from non-invasive brain recordings.
The model operates over a 450-ROI parcellation comprising 400 Schaefer cortical regions plus 50 Tian Scale-III subcortical regions. Pretraining used the Human Connectome Project Young Adult cohort (967 participants, aged 22-35) and HCP-Aging (262 participants for pretraining, 394 held out for testing), with external validation on the PsiConnect psilocybin dataset (54 participants after quality control). Reported results on HCP-Aging include 94.04% accuracy (F1 = 0.933) on sex classification and an age-prediction correlation of rho = 0.841 (MSE = 0.363). In unsupervised functional-network recovery the embeddings reach 84.44% classification accuracy, outperforming raw time series, VAE, and GCN baselines, and BOLD reconstruction achieves a mean R-squared of 0.438 across ROIs.
BrainSymphony targets computational neuroscience and clinical neuroimaging researchers who need expressive brain representations from limited data. Its embeddings support phenotype prediction (age, sex), disease and state classification, and unsupervised discovery of functional networks, and the model's pharmacological validation on psilocybin data suggests utility for studying drug-induced brain-state changes. Because it runs in unimodal or multimodal configurations, it can be applied to legacy datasets containing only fMRI or only diffusion MRI, lowering the barrier for smaller labs to adopt foundation-model workflows.
BrainSymphony contributes to the argument that brain foundation models need not be large to be effective, demonstrating that careful architectural design (Perceiver distillation, signed graph attention, adaptive fusion) can rival far larger models on neuroimaging benchmarks. By releasing code and pretrained checkpoints under an Apache-2.0 license, the authors make a reproducible, extensible baseline available to the neuro-AI community. As a 2025-2026 preprint its long-term adoption is still emerging, and its evaluation centers on HCP-derived cohorts, so broader generalization to clinical populations and other scanners remains to be established.
Khajehnejad, M., et al. (2025) BrainSymphony: A parameter-efficient multimodal foundation model for brain dynamics with limited data.
DOI: 10.48550/arXiv.2506.18314Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data