bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
Biosignals

MMM

Microsoft / South China University of Technology

A masked-autoencoder EEG pretraining framework that maps any electrode layout to a unified topology for topology-agnostic, cross-dataset representations.

Released: December 2023

Scalp electroencephalography (EEG) is a rich, abundant, and largely unlabeled signal, which makes it a natural candidate for the kind of large-scale self-supervised pretraining that has transformed vision and language. A persistent obstacle, however, is heterogeneity of acquisition: different EEG datasets use different numbers of electrodes, placed at different positions according to different montages. Models trained on one channel configuration typically cannot ingest data recorded with another, which fragments the available data and prevents the assembly of a single large pretraining corpus.

MMM (named for its Multi-dimensional position encoding, Multi-level channel hierarchy, and Multi-stage pretraining) addresses this problem by mapping every channel selection onto a single unified electrode topology, so that recordings from incompatible montages can be pretrained together. The result is a topology-agnostic representation that transfers across datasets regardless of the original electrode layout. MMM was introduced by Ke Yi, Yansen Wang, Kan Ren, and Dongsheng Li at Microsoft Research Asia (the first author worked on it as an intern affiliated with South China University of Technology) and presented at NeurIPS 2023.

The framework is built as a masked autoencoder, learning to reconstruct deliberately hidden portions of the EEG signal and thereby acquiring representations that capture the spatial and structural regularities of brain activity. By unifying topology rather than restricting itself to a fixed sensor set, MMM offers a route toward genuinely reusable EEG foundation models.

#Key Features

  • Unified topology: All channel selections, regardless of montage or electrode count, are projected onto one common electrode topology, allowing datasets with different configurations to be pretrained jointly.
  • Multi-dimensional position encoding: Geometric and spatial information about electrode locations is injected directly into channel tokens, giving the model an explicit sense of where each signal originates on the scalp.
  • Multi-level channel hierarchy: Aggregated regional tokens are modeled alongside individual channel tokens, enabling the network to reason about local channels and broader brain regions simultaneously.
  • Multi-stage pretraining: Training alternates between global random masking and regional masking, which together encourage robust reconstruction even at high mask ratios where standard masked autoencoders degrade.
  • Cross-dataset transfer: Because representations are topology-agnostic, a model pretrained on one corpus can be fine-tuned on downstream datasets with entirely different electrode setups.

#Technical Details

MMM uses a masked-autoencoder architecture with a transformer encoder-decoder bottleneck. Input EEG is represented as differential entropy (DE) features per channel; a subset of channel-time tokens is masked, the encoder produces a unified representation from the visible tokens, and a lightweight decoder reconstructs the masked entries. The multi-stage schedule applies global random masking and regional masking in sequence so that the encoder learns both fine-grained and region-level structure, sustaining high reconstruction quality at aggressive masking ratios. The released base encoder (tuh_pretrained_encoder_base.pt) is pretrained on the large Temple University Hospital (TUH) EEG corpus and distributed through the project page. On the SEED and SEED-IV emotion-recognition benchmarks, MMM reports improvements over prior state-of-the-art EEG representation methods. Reference code is provided in Microsoft's PhysioPro framework under an MIT license; the authors note ongoing investigation of the use of DE features for SEED and work toward training directly on raw EEG signals.

#Applications

MMM targets researchers and engineers building EEG decoding systems who must combine or transfer across datasets with mismatched electrode configurations. Its most directly demonstrated application is affective computing, specifically emotion recognition on the SEED and SEED-IV datasets, but the topology-agnostic design generalizes to any downstream EEG task, including brain-computer interfaces, clinical monitoring, and neuroscience analysis. By providing a pretrained base encoder, MMM lowers the labeled-data burden for groups that cannot collect large annotated EEG corpora of their own.

#Impact

MMM was one of the early demonstrations that EEG pretraining can be made montage-independent, directly tackling the channel-heterogeneity problem that previously prevented EEG datasets from being pooled. By framing diverse electrode layouts as projections onto a shared topology, it influenced subsequent topology-agnostic EEG foundation models that pursue the same goal of cross-dataset generality. Its open availability through the PhysioPro framework, together with a downloadable TUH-pretrained checkpoint, makes it a practical starting point for transfer learning. Limitations include reliance on differential-entropy features in the reported experiments and evaluation centered on emotion-recognition benchmarks, leaving broader clinical validation and raw-signal pretraining as acknowledged future work.

Citation

Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling

Yi, K., et al. (2023) Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling. Neural Information Processing Systems.

DOI: 10.52202/075280-2344

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations82
Influential2
References29

GitHub

Stars109
Forks15
Open Issues5
Contributors5
Last Push1d ago
LanguagePython
LicenseMIT

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
60Partial
Usability — can I run it?71
Reproducibility — can I retrain it?57
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

brain_computer_interfaceeegemotion_recognitionfoundation_modelmasked_autoencoderrepresentation_learningself_supervisedtransformer

Resources

GitHub RepositoryResearch PaperOfficial WebsiteDocumentation