Charité – Universitätsmedizin Berlin
Multimodal contrastive model aligning clinical EEG recordings with free-text reports, enabling label-efficient and zero-shot EEG phenotyping via text prompts.
ELM (EEG-Language Model) is a multimodal framework that learns joint representations of electroencephalography (EEG) recordings and their accompanying free-text clinical reports. Developed by Sam Gijsen and Kerstin Ritter at Charité – Universitätsmedizin Berlin and presented at ICML 2025, ELM addresses a persistent bottleneck in clinical neurophysiology: labeled EEG data is scarce and expensive to annotate, yet hospitals accumulate vast archives of recordings paired with the reports neurologists write during routine reading.
Rather than treating those reports as disposable, ELM uses them as a rich, naturally occurring supervisory signal. Borrowing the contrastive vision-language paradigm popularized by CLIP, ELM aligns EEG signals and clinical text in a shared embedding space, so that a recording and its matching report are pulled together while mismatched pairs are pushed apart. This is, to the authors' knowledge, the first work to enable zero-shot EEG classification through natural-language prompts and bidirectional retrieval between neural signals and reports.
The result is a model that is highly label-efficient: it transfers to downstream clinical phenotyping tasks with far fewer labeled examples than EEG-only baselines, and it can classify recordings for conditions it was never explicitly trained to label, simply by comparing them against textual descriptions.
ELM pairs a convolutional EEG encoder (an EEG_ResNet) with a clinical BERT text encoder, trained jointly with a contrastive loss. Pretraining uses roughly 15,000 EEG recordings paired with clinical reports from the Temple University Hospital (TUH) EEG Corpus. Signals are processed as 20-channel longitudinal bipolar (TCP) montages, bandpass-filtered to 0.1–49 Hz and resampled to 100 Hz. To cope with the misalignment between long recordings and multi-sentence reports, ELM combines timeseries cropping, text segmentation, and attention-based multiple instance learning so that clinically informative segments are emphasized without segment-level annotation. The authors release two pretrained encoder checkpoints operating on different epoch lengths (5-second and 60-second windows) as PyTorch .pt files. Evaluated across four clinical phenotyping tasks, ELM substantially outperforms EEG-only baselines, with the largest gains in the low-label regime that characterizes real clinical practice.
ELM is aimed at clinical neurophysiology workflows where annotated EEG is limited but report-paired recordings are abundant. Its label-efficient transfer suits building classifiers for abnormality detection, pathology screening, and related phenotyping tasks from small labeled sets, while zero-shot prompting lets clinicians and researchers probe recordings for conditions described in plain language. Cross-modal retrieval can power EEG archive search, surface similar prior cases, and assist report drafting or quality control. Researchers can use the released encoders as a feature extractor for downstream EEG modeling without retraining from scratch.
ELM extends the contrastive multimodal pretraining recipe that reshaped medical imaging into the EEG domain, demonstrating that the clinical reports already produced during routine reading are a powerful and underused source of supervision. By enabling the first zero-shot EEG classification and EEG–report retrieval, it points toward foundation-model approaches that reduce dependence on costly expert labeling in neurophysiology. As a relatively young, single-institution ICML 2025 contribution, its broad clinical generalization beyond the TUH corpus and across diverse acquisition setups remains to be established, but the released code and pretrained encoders provide a concrete starting point for the community.
Gijsen, S. & Ritter, K. (2024) EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping. International Conference on Machine Learning.
DOI: 10.48550/arXiv.2409.07480Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data