bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
Biosignals

ECG-FM

University of Toronto / Vector Institute

An open transformer foundation model for 12-lead electrocardiograms, pretrained on 1.5M ECGs with hybrid contrastive and generative self-supervision.

Released: August 2024
Parameters: 90.9 Million

ECG-FM is an open foundation model for the electrocardiogram (ECG), the most widely recorded cardiac biosignal in clinical medicine. Most ECG machine learning systems are trained from scratch on a single labeled dataset, which limits their performance when labels are scarce and hurts their ability to generalize across institutions and recording devices. ECG-FM addresses this by pretraining a large transformer on 1.5 million ECGs without labels, producing reusable representations that can be fine-tuned for many downstream clinical tasks with comparatively little task-specific data.

The model was developed by Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, and Bo Wang at the University of Toronto and the Vector Institute, with the preprint released in August 2024. It is built on the team's open fairseq_signals framework and adapts the wav2vec 2.0 self-supervised architecture, originally designed for speech, to multi-lead ECG waveforms.

ECG-FM is deliberately positioned as a fully open release: the code, the model weights, and the preprocessing pipeline are all public, in contrast to many proprietary ECG deep-learning systems. This makes it a practical starting point for cardiology researchers who want a strong pretrained backbone rather than building a waveform model from the ground up.

#Key Features

  • Open foundation model: Code, pretrained and fine-tuned checkpoints, and an end-to-end preprocessing pipeline are released under the MIT license, lowering the barrier to ECG deep learning.
  • Hybrid self-supervised objective: Pretraining combines wav2vec 2.0 masked prediction with contrastive multi-segment coding (CMSC) and random lead masking (RLM), a recipe the authors term W2V+CMSC+RLM (WCR).
  • Strong clinical performance: Reported results include AUROC 0.996 for atrial fibrillation detection and 0.929 for predicting reduced left ventricular ejection fraction (LVEF ≤ 40%).
  • Data-efficient and transferable: The pretrained representations are most advantageous in small-to-medium labeled data regimes and show robust cross-dataset generalization across different sources.
  • Standard 12-lead input: Operates on conventional clinical 12-lead ECG recordings, so it slots directly into existing cardiology data workflows.

#Technical Details

ECG-FM uses a wav2vec 2.0 transformer encoder with roughly 90.9 million parameters, operating on standard 12-lead ECG waveforms. Pretraining draws on about 1.5 million ECGs aggregated from MIMIC-IV-ECG v1.0 and the PhysioNet/Computing in Cardiology 2021 collection. The self-supervised objective layers contrastive multi-segment coding and random lead masking on top of the base wav2vec 2.0 masked-feature prediction task, encouraging representations that are consistent across temporal segments and robust to missing leads. The model was evaluated by fine-tuning and linear probing on downstream tasks including ECG interpretation labeling, atrial fibrillation detection, and reduced LVEF prediction, where it outperformed comparable supervised baselines, particularly when labeled data were limited. Released checkpoints include a pretrained backbone and a MIMIC-IV-ECG fine-tuned variant; weights are loaded through the project's GitHub instructions rather than the standard transformers library.

#Applications

ECG-FM is intended as a reusable backbone for clinical and research ECG analysis. By fine-tuning the pretrained model, researchers can build classifiers for arrhythmia detection, diagnostic interpretation, and prediction of conditions that are not obvious from the waveform to a human reader, such as reduced ejection fraction. Because it performs well with modest labeled datasets and transfers across recording sources, it is especially useful for institutions that lack the large annotated cohorts typically needed to train ECG models from scratch, and for studying rarer cardiac conditions where labels are inherently limited.

#Impact

ECG-FM is one of the first openly released ECG foundation models with public weights, code, and preprocessing, making strong pretrained cardiac biosignal representations broadly accessible. Its demonstration that speech-style self-supervised pretraining transfers effectively to multi-lead ECG supports the broader move toward foundation models for physiological signals and provides a reproducible baseline for the cardiology machine-learning community. Practical limitations include the need to load weights outside the standard transformers ecosystem and the usual caveat that clinical deployment requires prospective validation beyond the retrospective benchmarks reported.

Citation

ECG-FM: An Open Electrocardiogram Foundation Model

Preprint

McKeen, K., et al. (2024) ECG-FM: An Open Electrocardiogram Foundation Model. arXiv.org.

DOI: 10.48550/arXiv.2408.05178

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations86
Influential23
References34

GitHub

Stars288
Forks32
Open Issues2
Contributors1
Last Push4mo ago
LanguagePython
LicenseMIT

HuggingFace

Downloads0
Likes16
Last Modified1y ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
67Partial
Usability — can I run it?87
Reproducibility — can I retrain it?57
Model Openness Framework
Class III
Open Model

Tags

cardiologycontrastive_learningdisease_classificationecg_interpretationfoundation_modelrepresentation_learningself_supervisedtransformerwav2vec_2.0

Resources

GitHub RepositoryResearch PaperHuggingFace Model