University of North Carolina at Chapel Hill
A 1.2B-parameter fMRI foundation model of brain connectomes that uses brain-environment interaction tokens and multitask learning for behavior and disease prediction.
The Large Connectome Model (LCM) is a foundation model for functional magnetic resonance imaging (fMRI) that learns general-purpose representations of the human brain connectome — the network of statistical dependencies between activity in different brain regions. Clinical neuroimaging studies are typically constrained by small cohorts, which limits the performance of task-specific deep learning models. LCM addresses this bottleneck by pretraining on a large, demographically diverse pool of healthy and patient scans and then transferring to downstream clinical tasks with limited labeled data.
The model's central idea is to treat the brain not in isolation but in the context of its environment. LCM tokenizes "brain-environment interactions" (BEI) — pairings of connectome features with demographic and environmental variables such as age, sex, and behavioral measures — and learns across many of these interactions simultaneously in a multitask landscape. This framing lets the model exploit weak supervisory signal from metadata that usually accompanies fMRI scans, turning otherwise unlabeled data into useful pretraining targets.
LCM was developed by Ziquan Wei, Tingting Dan, and Guorong Wu at the University of North Carolina at Chapel Hill (ACMLab) and published at AAAI-26. It is among the largest publicly released brain connectome foundation models, with pretrained weights and code made available for reproducibility.
LCM uses a decoder-only Transformer architecture; the largest variant (LCM-Big) has 32 layers and approximately 1.2 billion parameters, combining multi-head self-attention over connectome features with cross-attention to the tokenized brain-environment interactions. It was pretrained on roughly 10,036 subjects drawn primarily from the Human Connectome Project Aging (HCPA) and Young Adult (HCPYA) cohorts, then evaluated across eight fMRI datasets including ADNI, PPMI, ABIDE, Taowu, Neurocon, and a schizophrenia cohort. Reported results include 86.30% accuracy (85.33% F1) for Alzheimer's diagnosis, 81.30% accuracy (84.18% F1) for Parkinson's disease, and 71.46% accuracy (72.50% F1) for Autism, with sex-prediction accuracy reaching up to 100% on smaller datasets. Pretrained weights are distributed via Google Drive linked from the GitHub repository.
LCM is aimed at neuroimaging and clinical research groups that want strong baselines for connectome analysis without training large models from scratch. Because the pretrained backbone transfers across cohorts, researchers can fine-tune it on small disease-specific datasets for early diagnosis of neurodegenerative and psychiatric conditions, or use it for phenotype and behavior prediction in cognitive neuroscience studies. The brain-environment interaction framing is particularly useful for studies that collect rich demographic and behavioral metadata alongside resting-state or task fMRI.
LCM extends the foundation-model paradigm that reshaped protein and genomics research into functional neuroimaging, a field where small sample sizes have long limited deep learning. By demonstrating that large-scale multitask pretraining on connectomes plus environmental context improves transfer to diverse clinical tasks, it offers a reusable backbone for the neuroimaging community and an open reference point for future brain foundation models. Its main limitations are typical of the area: pretraining draws heavily on healthy-adult HCP cohorts, downstream clinical datasets remain modest in size, and connectome construction depends on upstream preprocessing and atlas choices that can affect generalization.
Wei, Z., et al. (2026) Large Connectome Model: An fMRI Foundation Model of Brain Connectomes Empowered by Brain-Environment Interaction in Multitask Learning Landscape. Proceedings of the AAAI Conference on Artificial Intelligence.
DOI: 10.1609/aaai.v40i3.37198Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data