University of Technology Sydney
EEG-to-language multi-task foundation model that pairs a Q-Conformer encoder with frozen LLMs to decode coherent sentences from non-invasive brain signals.
BELT-2 (Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding) is a multi-task foundation model that translates non-invasive electroencephalography (EEG) recordings into natural language. Reading out language directly from scalp EEG is far harder than from invasive electrocorticography because surface signals are noisy, low in spatial resolution, and only weakly coupled to the words a person reads or imagines. BELT-2 tackles this by reframing brain decoding as a representation-alignment problem: it learns to map EEG features into the same embedding space as the subword tokens of a large language model, then lets a frozen LLM generate fluent text from those aligned representations.
The model was introduced in August 2024 by Jinzhao Zhou, Yiqun Duan, Thomas Do, Yu-Kai Wang, Chin-Teng Lin, and colleagues at the University of Technology Sydney. Its headline result is the first demonstration of decoding coherent, readable sentences from non-invasive brain signals, reaching a BLEU-1 score of 52.2% on the word-reading EEG benchmark ZuCo, a substantial jump over prior EEG-to-text systems.
BELT-2 sits at the intersection of biosignal modeling and language modeling. Rather than training a bespoke sequence-to-sequence network end to end, it bootstraps from the linguistic priors already captured by pretrained LLMs and concentrates its learning budget on bridging the gap between brain activity and language.
BELT-2 is built around the Q-Conformer encoder, which stacks Conformer blocks (interleaving multi-head self-attention with convolutional modules) and uses learnable query embeddings to distill EEG sequences into a compact set of vectors. Training proceeds in stages: a contrastive objective first aligns EEG features with BPE token embeddings of the target text, after which prefix-tuning maps the encoder output into the input space of a frozen pretrained language model that is never updated. Multi-task supervision spans translation, sentiment classification, and conditioned generation. Evaluated on the ZuCo eye-tracking-and-EEG reading corpus, BELT-2 attains a BLEU-1 of 52.2% and reports translation gains of roughly 31% to 162% over previous EEG-to-text methods across metrics.
BELT-2 targets brain-computer interface (BCI) research and assistive communication, where decoding intended language from non-invasive recordings could one day help people who cannot speak or type. Because it relies on scalp EEG rather than implanted electrodes, it is relevant to lower-risk, more scalable BCI settings than invasive speech neuroprostheses. Its multi-task design also makes it a useful research platform for studying how linguistic structure is represented in EEG and for benchmarking EEG-language alignment methods.
BELT-2 advanced the EEG-to-language field by showing that aligning brain signals to subword tokens and offloading generation to frozen LLMs can yield far more fluent output than end-to-end decoders, establishing a new performance bar on ZuCo. Its bootstrapping recipe has influenced subsequent work on EEG-language representation alignment and multi-task neural decoding. A practical limitation is reproducibility and openness: at the time of writing, the code was released only through an anonymous review link, and no de-anonymized public repository or pretrained weights could be located. As with most EEG reading-decoding results, performance is also tied to a specific benchmark and reading paradigm, so generalization to imagined speech or new subjects remains an open question.
Zhou, J., et al. (2024) BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding. arXiv.org.
DOI: 10.48550/arXiv.2409.00121Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data