bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
Biosignals

HeartBEiT

Icahn School of Medicine at Mount Sinai

A BEiT vision transformer pretrained on 8.5M 12-lead ECG images via masked image modeling, excelling at low-data-regime cardiac diagnosis.

Released: June 2023
Parameters: 86 Million

HeartBEiT is a domain-specific vision transformer for electrocardiogram (ECG) analysis, developed by researchers at the Icahn School of Medicine at Mount Sinai and published in npj Digital Medicine in June 2023. Rather than treating the ECG as a multichannel time series, HeartBEiT treats the standard printed 12-lead ECG as an image and applies a transformer originally designed for computer vision. This reframing lets the model exploit the same visual layout that clinicians read, while inheriting the scalable self-supervised pretraining recipe of modern vision foundation models.

The central problem HeartBEiT addresses is data efficiency. Convolutional neural networks for ECG diagnosis typically require very large labeled datasets to reach clinical-grade accuracy, and transfer learning from natural-image models (e.g., ImageNet-pretrained CNNs) transfers poorly to biomedical signals. By pretraining directly on millions of unlabeled ECG images from a single health system, HeartBEiT learns ECG-specific visual representations that fine-tune effectively even when only a handful of labeled examples are available.

HeartBEiT was among the early demonstrations that the BEiT-style masked image modeling paradigm could be ported from general computer vision to a clinical biosignal, and it remains a notable reference point for image-based approaches to ECG interpretation that contrast with the more common waveform/time-series foundation models in cardiology.

#Key Features

  • ECG-as-image formulation: HeartBEiT renders each 12-lead ECG as an image and processes it with a vision transformer, aligning the model's input with the visual representation cardiologists actually inspect.
  • Masked image modeling pretraining: The model is pretrained self-supervised using BEiT-style masked image modeling, predicting masked image patches and thereby learning ECG morphology without requiring diagnostic labels.
  • Strong low-data-regime performance: HeartBEiT's defining result is markedly higher diagnostic accuracy than standard CNNs when labeled training samples are scarce, making it attractive for rare conditions and small cohorts.
  • Improved explainability: Attention-based saliency highlights biologically relevant regions of the ECG, offering more granular and clinically interpretable explanations than typical CNN attribution maps.
  • Validated across multiple cardiac tasks: It was evaluated on diagnosis of hypertrophic cardiomyopathy (HCM), low left ventricular ejection fraction (LVEF), and ST-elevation myocardial infarction (STEMI) with independent validation sets.

#Technical Details

HeartBEiT is built on the BEiT-base architecture, a vision transformer with roughly 86 million parameters. It was pretrained via masked image modeling on approximately 8.5 million 12-lead ECG images drawn from about 2.1 million patients in the Mount Sinai Health System. Pretraining is fully self-supervised: the model reconstructs masked image patches over visual tokens, learning ECG-specific features before any diagnostic labels are introduced. The pretrained backbone is then fine-tuned on each downstream classification task. Across diagnosis of HCM, low LVEF, and STEMI, the authors compared HeartBEiT against standard CNN architectures (such as EfficientNet and ResNet variants) at progressively smaller training sample sizes and on independent validation datasets, reporting that HeartBEiT's advantage grows as labeled data becomes scarcer.

#Applications

HeartBEiT is aimed at clinical and translational cardiology settings where labeled ECG data is limited. Because it fine-tunes effectively from few examples, it is well suited to detecting conditions that are difficult or impossible to read directly from the ECG (such as low ejection fraction or hypertrophic cardiomyopathy), to building diagnostic models for rare presentations, and to institutions without millions of labeled tracings. Its image-based explanations also support clinician review and auditing of model predictions, which is valuable for deployment in decision-support workflows.

#Impact

HeartBEiT helped establish image-based, self-supervised transformers as a viable direction for ECG analysis, demonstrating that domain-specific pretraining can outperform both ImageNet transfer and conventional CNNs—especially in very low-data regimes—while improving interpretability. A practical limitation for external adoption is access to the weights: the fine-tuning and checkpoint-loading code is openly available on GitHub, but the pretrained model weights are distributed only through a Mount Sinai data-sharing agreement rather than as a freely downloadable artifact, which constrains fully open reuse and reproducibility despite the public codebase.

Citation

A foundational vision transformer improves diagnostic performance for electrocardiograms

Vaid, A., et al. (2023) A foundational vision transformer improves diagnostic performance for electrocardiograms. npj Digit. Medicine.

DOI: 10.1038/s41746-023-00840-9

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations111
Influential8
References24

GitHub

Stars25
Forks4
Open Issues3
Contributors1
Last Push3y ago
LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
30Closed
Usability — can I run it?25
Reproducibility — can I retrain it?22
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cardiologydisease_diagnosisecg_classificationelectrocardiogramfoundation_modelself_supervisedvision_transformer

Resources

GitHub RepositoryResearch Paper