Yale University / Baylor College of Medicine / Princeton University
A foundation model for fMRI brain activity recordings, pretrained with masked autoencoding on ~6,700 hours of data for clinical prediction and zero-shot network discovery.
BrainLM (Brain Language Model) is a foundation model for functional magnetic resonance imaging (fMRI) recordings, designed to learn a general-purpose representation of human brain activity dynamics. Rather than training a bespoke model for each neuroimaging task, BrainLM follows the self-supervised pretraining paradigm that transformed natural language and protein modeling: it is trained once on a large corpus of unlabeled brain recordings and then adapted, through fine-tuning or zero-shot inference, to a range of downstream problems. It was developed by researchers in David van Dijk's lab at Yale University, with collaborators at Baylor College of Medicine and Princeton University, and presented at ICLR 2024.
The model addresses a long-standing bottleneck in computational neuroscience: fMRI datasets are individually small and heterogeneous, making it difficult to train deep models that generalize across cohorts, scanners, and tasks. By pretraining on roughly 6,700 hours of fMRI from large population studies, BrainLM learns spatiotemporal structure that transfers to new datasets it never saw during training, including external clinical cohorts.
BrainLM is notable as one of the first large-scale foundation models built directly on whole-brain fMRI time series rather than on task-specific labels or static connectivity matrices. It demonstrates that brain recordings, like text or protein sequences, contain enough self-supervisory signal to support a single reusable backbone for neuroscience.
BrainLM uses a Vision Transformer masked autoencoder (ViTMAE) architecture applied to fMRI time series parcellated with the AAL-424 atlas, yielding 424 regional signals sampled at roughly 1 Hz. The model is trained with a mean-squared-error reconstruction objective over masked spatiotemporal patches. Pretraining used approximately 6,700 hours of recordings: about 6,450 hours (76,296 recordings) from the UK Biobank and about 250 hours (1,002 recordings) from the Human Connectome Project, with motion correction, normalization, temporal filtering, and ICA denoising, split 80/10/10 into train/validation/test. Two checkpoints are released, with 111 million and 650 million parameters; the larger model uses flash attention. Training ran for 100 epochs with a batch size of 512 using the Adam optimizer.
BrainLM is aimed at neuroscientists and clinical researchers working with fMRI who want a pretrained backbone rather than training models from scratch on small studies. Practical uses include decoding cognitive and mental-health variables, forecasting future brain states, simulating the effects of interventions on brain dynamics through prompting, and discovering functional networks in an unsupervised way. Because pretrained weights are publicly available on HuggingFace, groups with limited labeled data can fine-tune for their own biomarkers or diagnostic targets.
BrainLM helped establish the foundation-model paradigm for brain activity recordings, showing that large-scale self-supervised pretraining on fMRI produces representations that transfer across cohorts and tasks. Its public 111M- and 650M-parameter checkpoints and ICLR 2024 publication have made it a reference point for subsequent neuroimaging foundation models. Important limitations remain: pretraining used only healthy adults, so generalization to clinical populations is uncertain; the approach is currently specific to fMRI and untested on other modalities; and BOLD fMRI is itself an indirect proxy for neural activity. The pretrained weights are released under a CC BY-NC-ND 4.0 license, and the UK Biobank training data requires a separate access application.
Caro, J. O., et al. (2024) BrainLM: A foundation model for brain activity recordings. bioRxiv.
DOI: 10.1101/2023.09.12.557460Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data