National University of Singapore
A fMRI brain-dynamics foundation model that adapts the Joint-Embedding Predictive Architecture with brain gradient positioning and spatiotemporal masking.
Brain-JEPA is a foundation model for brain dynamics that learns transferable representations of resting-state functional MRI (fMRI) signals through self-supervised pretraining. Developed by Zijian Dong, Ruilin Li, Yilei Wu, and colleagues in Juan Helen Zhou's lab at the National University of Singapore, it was published as a Spotlight paper at NeurIPS 2024 (arXiv:2409.19407, September 2024). The model adapts Meta AI's Joint-Embedding Predictive Architecture (JEPA) — originally developed for images — to the spatiotemporal structure of brain activity time-series.
The central problem Brain-JEPA addresses is that fMRI data is high-dimensional, noisy, and scarce relative to the demographic and clinical questions researchers want to answer. Rather than predicting raw signal values (as masked-autoencoder approaches like BrainLM do), Brain-JEPA predicts masked regions in a learned latent space, which encourages the encoder to capture semantically meaningful brain-activity structure rather than fitting noise. Inputs are parcellated into 450 regions of interest (ROIs) — 400 cortical from the Schaefer atlas and 50 subcortical from the Tian Scale III atlas — and represented as ROI-by-time patches.
Two innovations distinguish the model from generic time-series transformers: Brain Gradient Positioning, a functional coordinate system that encodes each ROI by its position along principal connectivity gradients rather than by arbitrary index, and Spatiotemporal Masking, a pretraining mask tailored to the heterogeneous spatial and temporal axes of fMRI patches.
Brain-JEPA uses a Vision Transformer encoder, with the main results reported for ViT-Base (~86M parameters); ViT-Small (~22M) and ViT-Large (~307M) variants are also provided. Inputs span 160 timesteps across 450 ROIs, patched along the time axis with a patch size of 16. Pretraining used resting-state fMRI from roughly 32,000 UK Biobank participants (80% of a 40,162-subject cohort aged 44–83). Against BrainLM, the prior large-scale fMRI model, Brain-JEPA improved internal UK Biobank age-prediction MSE from 0.612 to 0.501 and sex-classification accuracy from 86.47% to 88.17%, with a larger margin on external transfer (81.52% vs. 74.39% sex accuracy on HCP-Aging). Downstream evaluation spanned HCP-Aging (656 subjects), ADNI (normal vs. MCI and amyloid classification), the MACC Asian cohort (539 subjects), plus OASIS-3 and CamCAN. Pretrained and example fine-tuned checkpoints are released via the official codebase.
Brain-JEPA serves neuroscience and clinical-imaging researchers who work with resting-state fMRI but lack the large labeled datasets needed to train task-specific models from scratch. Pretrained embeddings can be fine-tuned or linearly probed for demographic estimation (brain age, sex), neurodegenerative disease diagnosis and prognosis (mild cognitive impairment, amyloid status in ADNI and MACC cohorts), and cognitive-trait prediction. Because it generalizes across ethnically distinct cohorts, it is particularly useful for groups studying under-represented populations where large in-house datasets are unavailable, and it provides a reusable backbone for biomarker discovery and connectome-based prediction pipelines.
Brain-JEPA demonstrated that joint-embedding predictive learning — rather than masked reconstruction — produces stronger, more generalizable representations of brain dynamics, establishing a competitive alternative to BrainLM as a fMRI foundation backbone. Its NeurIPS 2024 Spotlight recognition and released code and checkpoints have made it a reference point for subsequent fMRI foundation-model work, especially around functional positional encoding and cross-population generalization. A key limitation is that the model operates on parcellated ROI time-series under a fixed atlas and a 160-timestep window, so applying it to data with different parcellations, acquisition parameters, or substantially longer scans requires adaptation, and its clinical validation remains at the research-cohort stage rather than prospective deployment.
Dong, Z., et al. (2024) Brain-JEPA: Brain Dynamics Foundation Model with Gradient Positioning and Spatiotemporal Masking. Neural Information Processing Systems.
DOI: 10.48550/arXiv.2409.19407Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data