EEGFormer

EEG foundation model pretrained with vector-quantized self-supervision, yielding interpretable discrete codes that transfer to seizure detection.

Released: January 2024

EEGFormer is a large-scale foundation model for electroencephalography (EEG) developed by researchers at Microsoft Research and ShanghaiTech University and released as an arXiv preprint in January 2024. It addresses a persistent limitation in clinical and neuroscience EEG analysis: most deep-learning models are trained from scratch on a single dataset for a single task, so they generalize poorly across recording montages, patient populations, and clinical questions. EEGFormer instead learns a single set of universal EEG representations through self-supervised pretraining on a large unlabeled corpus, then transfers to diverse downstream tasks with lightweight fine-tuning.

The model's distinguishing idea is to combine transformer-based representation learning with vector quantization, encoding raw EEG into a discrete vocabulary of learned codes. This produces representations that are not only transferable but also interpretable: because every segment of signal maps to a discrete token from a fixed codebook, recurring neural patterns can be inspected and associated with specific clinical phenomena. EEGFormer was presented in the context of the AAAI 2024 Spring Symposium on Clinical Foundation Models, positioning it within the broader effort to build reusable foundation models for biosignals rather than bespoke task-specific networks.

EEGFormer fits alongside contemporaneous EEG foundation models such as BrainBERT and BIOT, but its emphasis on a discrete, vector-quantized token space sets it apart and connects EEG modeling to the discrete-tokenization strategies that have driven progress in language and vision.

Key Features

Vector-quantized discrete tokens: A learned codebook maps each EEG patch to its nearest discrete code, yielding a compact, interpretable vocabulary of recurring neural patterns rather than opaque continuous embeddings.
Self-supervised pretraining at scale: The model is pretrained on roughly 1.7 TB of unlabeled EEG from the Temple University EEG Corpus, removing the dependence on scarce expert annotations.
Transferable across clinical tasks: A single pretrained encoder fine-tunes to abnormality detection, artifact classification, slowing detection, and seizure detection without redesigning the architecture per task.
Cross-dataset generalization: Beyond the Temple corpus, EEGFormer transfers to an external neonatal seizure EEG dataset, demonstrating robustness to a very different patient population and recording setup.
Interpretable pattern analysis: The discrete code assignments can be examined to link particular tokens to clinical events, supporting explainability that continuous models typically lack.

Technical Details

EEGFormer comprises a Transformer encoder (roughly 6-12 layers, hidden dimension 128) paired with a shallow 3-layer Transformer decoder, with a vector-quantization bottleneck between them. EEG is segmented into fixed 12-second windows resampled to 250 Hz and split into patches; each patch is quantized to the nearest entry in a learned codebook. The authors release three sizes that differ primarily in codebook capacity: EEGFormer_s (K=512), EEGFormer_b (K=1024), and EEGFormer_l (K=2048). Pretraining reconstructs the discrete codes in a self-supervised objective over the ~1.7 TB Temple University Hospital (TUH) corpus. Downstream evaluation spans TUAB (normal vs. abnormal), TUAR (artifact classification), TUSL (slowing events), TUSZ (seizure detection), and an external neonatal seizure dataset. The largest model reports strong discrimination, including roughly 0.876 AUROC on TUAB abnormality detection, 0.883 AUROC on TUSZ seizure detection, and 0.833 AUROC on the neonatal seizure benchmark.

Applications

EEGFormer targets clinical and research EEG workflows where labeled data is limited but unlabeled recordings are abundant. Practical use cases include automated screening for abnormal EEGs, detection of epileptic seizures in adults and neonates, rejection of artifacts during signal quality control, and identification of pathological slowing. Because the pretrained encoder transfers with modest fine-tuning, hospitals and labs can adapt it to local datasets and new tasks without training a model from scratch, while the interpretable token representation helps clinicians and researchers reason about what the model is detecting.

Impact

EEGFormer contributes to the emerging class of EEG foundation models by showing that discrete, vector-quantized representations can be both transferable across heterogeneous clinical tasks and interpretable, including transfer to an out-of-distribution neonatal population. Its framing within the AAAI 2024 Spring Symposium on Clinical Foundation Models reflects growing interest in reusable biosignal models. A notable limitation for adoption is that, as of this writing, no public code or pretrained weights have been located, so independent reproduction and direct use of the released checkpoints are not currently possible; the work is primarily influential as a methodological reference for discrete-token EEG modeling.

Citation

EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model

Preprint

Chen, Y., et al. (2024) EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model. arXiv.org.

DOI: 10.48550/arXiv.2401.10278

Recent citations

Papers that recently cited this model.

DiffEEG: A Self-Supervised Denoising Diffusion Model for Learning EEG Generic Representations
A. Helwan, Lina Abou-Abbas, Hussein El Amouri, et al.
Jul 2026
0Influential
BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language
Haitao Wu, Qirui Zhang, Zhouheng Yao, et al.
Jun 2026
0
NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction
Wenhao Gao, Yifan Wang, Yijia Ma, et al.
Jun 2026
0

Top citations

The most-cited papers that cite this model.

Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark
Elizabeth Fons, Rachneet Kaur, Soham Palande, et al.
Conference on Empirical Methods in Natural Language Processing · Apr 2024
37
Brain Foundation Models: A survey on advancements in neural signal processing and brain discovery
Xin-qiu Zhou, Chenyu Liu, Zhisheng Chen, et al.
IEEE Signal Processing Magazine · Mar 2025
34
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models on Heterogeneous Microcontrollers
Moritz Scherer, Luka Macan, V. J. Jung, et al.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems · Aug 2024
31
Neuro-3D: Towards 3D Visual Decoding from EEG Signals
Zhan Guo, Jiamin Wu, Yonghao Song, et al.
Computer Vision and Pattern Recognition · Nov 2024
22
The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
D. Jayalath, Gilad Landau, Brendan Shillingford, et al.
International Conference on Machine Learning · Jun 2024
22

Citations

Total Citations64

Influential9

References30

Fields of citing research

Computer Science100%
Medicine56%
Engineering43%
Biology13%
Physics3%
Psychology2%
Linguistics2%
Environmental Science2%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

22Closed

Usability — can I run it?15

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Vector-quantized discrete tokens: A learned codebook maps each EEG patch to its nearest discrete code, yielding a compact, interpretable vocabulary of recurring neural patterns rather than opaque continuous embeddings.

Self-supervised pretraining at scale: The model is pretrained on roughly 1.7 TB of unlabeled EEG from the Temple University EEG Corpus, removing the dependence on scarce expert annotations.

Transferable across clinical tasks: A single pretrained encoder fine-tunes to abnormality detection, artifact classification, slowing detection, and seizure detection without redesigning the architecture per task.

Cross-dataset generalization: Beyond the Temple corpus, EEGFormer transfers to an external neonatal seizure EEG dataset, demonstrating robustness to a very different patient population and recording setup.

Interpretable pattern analysis: The discrete code assignments can be examined to link particular tokens to clinical events, supporting explainability that continuous models typically lack.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

DiffEEG: A Self-Supervised Denoising Diffusion Model for Learning EEG Generic Representations

A. Helwan, Lina Abou-Abbas, Hussein El Amouri, et al.

Jul 2026

0Influential

BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

Haitao Wu, Qirui Zhang, Zhouheng Yao, et al.

Jun 2026

NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction

Wenhao Gao, Yifan Wang, Yijia Ma, et al.

Jun 2026

EEGFormer

#Key Features

#Technical Details

#Applications

#Impact

Citation

EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model

Recent citations

DiffEEG: A Self-Supervised Denoising Diffusion Model for Learning EEG Generic Representations

BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

EEGFormer

#Key Features

#Technical Details

#Applications

#Impact

Citation

EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model

Recent citations

DiffEEG: A Self-Supervised Denoising Diffusion Model for Learning EEG Generic Representations

BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact