CBraMod

EEG foundation model for brain-computer interface decoding, factorizing self-attention into parallel spatial and temporal branches.

Released: December 2024

CBraMod (Criss-Cross Brain Foundation Model) is a self-supervised foundation model for decoding electroencephalography (EEG) signals across a wide range of brain-computer interface (BCI) tasks. Developed by Jiquan Wang, Sha Zhao, and colleagues at Zhejiang University and presented at ICLR 2025, it targets a long-standing obstacle in EEG modeling: the heterogeneity of recording setups. EEG datasets differ in the number and placement of electrodes, sampling rates, and recording durations, which makes it hard to train a single model that transfers across studies.

The model's central insight is that the spatial dependencies between electrodes and the temporal dependencies along the signal are fundamentally different in nature and should not be modeled with a single, undifferentiated attention mechanism. Prior EEG transformers tend to flatten all patches into one sequence and apply standard self-attention, conflating these two axes. CBraMod instead factorizes attention into parallel spatial and temporal branches, letting the network learn electrode-to-electrode and time-to-time relationships separately before combining them.

Pretrained by masked reconstruction on a large EEG corpus and then fine-tuned on downstream datasets, CBraMod is released as open code under an MIT license with pretrained weights on Hugging Face, making it a practical backbone for both research and applied BCI development.

Key Features

Criss-cross attention: A transformer backbone with two parallel attention mechanisms that separately model spatial (across-electrode) and temporal (across-time) dependencies, reflecting the heterogeneous structure of EEG.
Asymmetric conditional positional encoding: A flexible positional scheme that adapts to diverse EEG formats, so the same model can ingest recordings with different channel counts and durations without redesign.
Masked-reconstruction pretraining: EEG is split into patches and the model is trained to reconstruct masked patches, a self-supervised objective that learns transferable representations without task labels.
Broad task coverage: A single pretrained backbone fine-tunes to state-of- the-art results on up to 10 distinct downstream BCI tasks spanning 12 public datasets, demonstrating strong generalization.
Open release: Code, pretraining and fine-tuning scripts, and pretrained weights are publicly available, lowering the barrier to reproducing and extending the work.

Technical Details

CBraMod tokenizes EEG by dividing each channel's signal into fixed-length patches; the released configuration processes signals at 200 Hz with 200 samples (one second) per patch, arranged as a channel-by-time grid of tokens (for example, a 22-channel, 4-second input forms a 22 x 4 patch grid). Each transformer layer applies its spatial and temporal attention branches in parallel over this grid, and the asymmetric conditional positional encoding supplies position information that generalizes across channel layouts and sequence lengths. During pretraining, a subset of patches is masked and the model reconstructs them, learning representations directly from raw EEG without manual annotation. The pretrained encoder is then attached to lightweight task-specific heads and fine-tuned on each downstream dataset. Across 10 tasks and 12 datasets, CBraMod reports state-of-the-art performance relative to prior EEG foundation models such as LaBraM and BIOT.

Applications

CBraMod serves as a general-purpose feature extractor for EEG-based brain- computer interfaces and clinical neurophysiology. Reported downstream tasks include emotion recognition, motor imagery classification, sleep-stage scoring, seizure and abnormal-EEG detection, event-type and mental-state classification, and related decoding problems. Because the same pretrained backbone transfers across these heterogeneous datasets, it is especially useful for groups working with limited labeled EEG, who can fine-tune a strong foundation model rather than training from scratch—benefiting BCI researchers, sleep and epilepsy investigators, and developers of neurotechnology applications.

Impact

CBraMod advances the case that EEG, like protein sequences and natural images, benefits from large-scale self-supervised foundation models, and its factorized criss-cross attention offers a concrete architectural lesson: spatial and temporal structure in biosignals are better modeled separately than jointly. Its acceptance at ICLR 2025, permissive MIT licensing, and publicly released weights have made it a visible reference point and reusable backbone in the fast-growing EEG foundation-model literature. As with any model pretrained on a particular corpus of recording montages, transfer to substantially different electrode configurations or atypical clinical populations should be validated empirically before deployment.

Citation

CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding

Preprint

Wang, J., et al. (2024) CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding. International Conference on Learning Representations.

DOI: 10.48550/arXiv.2412.07236

Recent citations

Papers that recently cited this model.

Research on motor imagery task transfer learning driven by the fusion paradigm of active and passive motor intentions
Xiang Hu, Fei Wang, Jinying Bi
International Conference on Robotics and Sensor Networks · Jul 2026
0
CoCoT-EEG: Contrastive-Pretrained Multiscale Convolutional Transformer for EEG Decoding
Gabriel Mahuas, Victoria Shevchenko, Ugo Tanielian, et al.
Jul 2026
0Influential
STST-JEPA: Shallow-Target Spatio-Temporal Joint Embedding Prediction Architecture For EEG Self-Supervised Learning
Roy Segal, Yoni Svechinsky, T. Fekete
Jul 2026
0

Top citations

The most-cited papers that cite this model.

VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
Mouxiang Chen, Lefei Shen, Zhuo Li, et al.
International Conference on Machine Learning · Aug 2024
81
REVE: A Foundation Model for EEG - Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, et al.
arXiv.org · Oct 2025
44
CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding
Yuchen Zhou, Jiamin Wu, Zichen Ren, et al.
arXiv.org · Jun 2025
35Influential
Brain Foundation Models: A survey on advancements in neural signal processing and brain discovery
Xin-qiu Zhou, Chenyu Liu, Zhisheng Chen, et al.
IEEE Signal Processing Magazine · Mar 2025
34
Toward Foundation Model for Multivariate Wearable Sensing of Physiological Signals
Yunfei Luo, Yuliang Chen, Asif Salekin, et al.
ACM Transactions on Computing for Healthcare · Dec 2024
27

Citations

Total Citations185

Influential69

References101

GitHub

Stars329

Forks47

Open Issues32

Contributors1

Last Push1mo ago

LanguagePython

LicenseMIT

HuggingFace

Downloads0

Likes6

Last Modified1y ago

Fields of citing research

Computer Science99%
Medicine47%
Engineering42%
Biology15%
Physics2%
Psychology2%
Linguistics1%
Mathematics1%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

78Open

Usability — can I run it?94

Reproducibility — can I retrain it?66

Model Openness Framework

Class III

Open Model

Resources

GitHub Repository Research Paper Official Website HuggingFace Model

Key Features

Criss-cross attention: A transformer backbone with two parallel attention mechanisms that separately model spatial (across-electrode) and temporal (across-time) dependencies, reflecting the heterogeneous structure of EEG.

Asymmetric conditional positional encoding: A flexible positional scheme that adapts to diverse EEG formats, so the same model can ingest recordings with different channel counts and durations without redesign.

Masked-reconstruction pretraining: EEG is split into patches and the model is trained to reconstruct masked patches, a self-supervised objective that learns transferable representations without task labels.

Broad task coverage: A single pretrained backbone fine-tunes to state-of- the-art results on up to 10 distinct downstream BCI tasks spanning 12 public datasets, demonstrating strong generalization.

Open release: Code, pretraining and fine-tuning scripts, and pretrained weights are publicly available, lowering the barrier to reproducing and extending the work.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Research on motor imagery task transfer learning driven by the fusion paradigm of active and passive motor intentions

Xiang Hu, Fei Wang, Jinying Bi

International Conference on Robotics and Sensor Networks · Jul 2026

CoCoT-EEG: Contrastive-Pretrained Multiscale Convolutional Transformer for EEG Decoding

Gabriel Mahuas, Victoria Shevchenko, Ugo Tanielian, et al.

Jul 2026

0Influential

STST-JEPA: Shallow-Target Spatio-Temporal Joint Embedding Prediction Architecture For EEG Self-Supervised Learning

Roy Segal, Yoni Svechinsky, T. Fekete

Jul 2026

CBraMod

#Key Features

#Technical Details

#Applications

#Impact

Citation

CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding

Recent citations

CoCoT-EEG: Contrastive-Pretrained Multiscale Convolutional Transformer for EEG Decoding

STST-JEPA: Shallow-Target Spatio-Temporal Joint Embedding Prediction Architecture For EEG Self-Supervised Learning

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

CBraMod

#Key Features

#Technical Details

#Applications

#Impact

Citation

CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding

Recent citations

CoCoT-EEG: Contrastive-Pretrained Multiscale Convolutional Transformer for EEG Decoding

STST-JEPA: Shallow-Target Spatio-Temporal Joint Embedding Prediction Architecture For EEG Self-Supervised Learning

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact