HeartBEiT

Vision transformer for electrocardiograms that reads the printed 12-lead ECG as an image, enabling data-efficient diagnosis from few labeled examples.

Released: June 2023

Parameters: 86 Million

HeartBEiT is a domain-specific vision transformer for electrocardiogram (ECG) analysis, developed by researchers at the Icahn School of Medicine at Mount Sinai and published in npj Digital Medicine in June 2023. Rather than treating the ECG as a multichannel time series, HeartBEiT treats the standard printed 12-lead ECG as an image and applies a transformer originally designed for computer vision. This reframing lets the model exploit the same visual layout that clinicians read, while inheriting the scalable self-supervised pretraining recipe of modern vision foundation models.

The central problem HeartBEiT addresses is data efficiency. Convolutional neural networks for ECG diagnosis typically require very large labeled datasets to reach clinical-grade accuracy, and transfer learning from natural-image models (e.g., ImageNet-pretrained CNNs) transfers poorly to biomedical signals. By pretraining directly on millions of unlabeled ECG images from a single health system, HeartBEiT learns ECG-specific visual representations that fine-tune effectively even when only a handful of labeled examples are available.

HeartBEiT was among the early demonstrations that the BEiT-style masked image modeling paradigm could be ported from general computer vision to a clinical biosignal, and it remains a notable reference point for image-based approaches to ECG interpretation that contrast with the more common waveform/time-series foundation models in cardiology.

Key Features

ECG-as-image formulation: HeartBEiT renders each 12-lead ECG as an image and processes it with a vision transformer, aligning the model's input with the visual representation cardiologists actually inspect.
Masked image modeling pretraining: The model is pretrained self-supervised using BEiT-style masked image modeling, predicting masked image patches and thereby learning ECG morphology without requiring diagnostic labels.
Strong low-data-regime performance: HeartBEiT's defining result is markedly higher diagnostic accuracy than standard CNNs when labeled training samples are scarce, making it attractive for rare conditions and small cohorts.
Improved explainability: Attention-based saliency highlights biologically relevant regions of the ECG, offering more granular and clinically interpretable explanations than typical CNN attribution maps.
Validated across multiple cardiac tasks: It was evaluated on diagnosis of hypertrophic cardiomyopathy (HCM), low left ventricular ejection fraction (LVEF), and ST-elevation myocardial infarction (STEMI) with independent validation sets.

Technical Details

HeartBEiT is built on the BEiT-base architecture, a vision transformer with roughly 86 million parameters. It was pretrained via masked image modeling on approximately 8.5 million 12-lead ECG images drawn from about 2.1 million patients in the Mount Sinai Health System. Pretraining is fully self-supervised: the model reconstructs masked image patches over visual tokens, learning ECG-specific features before any diagnostic labels are introduced. The pretrained backbone is then fine-tuned on each downstream classification task. Across diagnosis of HCM, low LVEF, and STEMI, the authors compared HeartBEiT against standard CNN architectures (such as EfficientNet and ResNet variants) at progressively smaller training sample sizes and on independent validation datasets, reporting that HeartBEiT's advantage grows as labeled data becomes scarcer.

Applications

HeartBEiT is aimed at clinical and translational cardiology settings where labeled ECG data is limited. Because it fine-tunes effectively from few examples, it is well suited to detecting conditions that are difficult or impossible to read directly from the ECG (such as low ejection fraction or hypertrophic cardiomyopathy), to building diagnostic models for rare presentations, and to institutions without millions of labeled tracings. Its image-based explanations also support clinician review and auditing of model predictions, which is valuable for deployment in decision-support workflows.

Impact

HeartBEiT helped establish image-based, self-supervised transformers as a viable direction for ECG analysis, demonstrating that domain-specific pretraining can outperform both ImageNet transfer and conventional CNNs—especially in very low-data regimes—while improving interpretability. A practical limitation for external adoption is access to the weights: the fine-tuning and checkpoint-loading code is openly available on GitHub, but the pretrained model weights are distributed only through a Mount Sinai data-sharing agreement rather than as a freely downloadable artifact, which constrains fully open reuse and reproducibility despite the public codebase.

Citation

A foundational vision transformer improves diagnostic performance for electrocardiograms

Vaid, A., et al. (2023) A foundational vision transformer improves diagnostic performance for electrocardiograms. npj Digit. Medicine.

DOI: 10.1038/s41746-023-00840-9

Recent citations

Papers that recently cited this model.

MorphologyFM: A Foundation Model for Morphology-Aware Representation Learning from ECG and Pulse Oximetry Waveforms
Saiyang Feng, Yuanyu Zhang, Shi Li
Jul 2026
0
EFIB-Net: Information Bottleneck-Guided Multi-Resolution Attention Network for Robust ECG Denoising
Minghao Ma, Chen Liu, Yulin Mu, et al.
Applied Sciences · Jun 2026
0
WAVE: Wall-Aligned Vector Embedding for Self-Supervised Learning of Electrocardiograms
Shurong Pan, Wenhan Liu, Qingyuan Wu, et al.
Bioengineering · Jun 2026
0

Top citations

The most-cited papers that cite this model.

Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu, Lin Li, Jiankai Sun, et al.
IEEE journal of biomedical and health informatics · Mar 2023
213Influential
Cardiovascular care with digital twin technology in the era of generative artificial intelligence.
P. Thangaraj, S. Benson, E. Oikonomou, et al.
European Heart Journal · Sep 2024
104
ECG-FM: An Open Electrocardiogram Foundation Model
Kaden McKeen, Laura Oliva, Sameer Masood, et al.
arXiv.org · Aug 2024
88
The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models
Dimitris Spathis, F. Kawsar
J. Am. Medical Informatics Assoc. · Sep 2023
51
Foundation model of ECG diagnosis: Diagnostics and explanations of any form and rhythm on ECG
Yuanyuan Tian, Zhiyuan Li, Yanrui Jin, et al.
Cell Reports Medicine · Dec 2024
41

Citations

Total Citations117

Influential10

References24

GitHub

Stars25

Forks4

Open Issues3

Contributors1

Last Push3y ago

LanguagePython

Fields of citing research

Computer Science96%
Medicine93%
Engineering34%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

30Closed

Usability — can I run it?25

Reproducibility — can I retrain it?22

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

ECG-as-image formulation: HeartBEiT renders each 12-lead ECG as an image and processes it with a vision transformer, aligning the model's input with the visual representation cardiologists actually inspect.

Masked image modeling pretraining: The model is pretrained self-supervised using BEiT-style masked image modeling, predicting masked image patches and thereby learning ECG morphology without requiring diagnostic labels.

Strong low-data-regime performance: HeartBEiT's defining result is markedly higher diagnostic accuracy than standard CNNs when labeled training samples are scarce, making it attractive for rare conditions and small cohorts.

Improved explainability: Attention-based saliency highlights biologically relevant regions of the ECG, offering more granular and clinically interpretable explanations than typical CNN attribution maps.

Validated across multiple cardiac tasks: It was evaluated on diagnosis of hypertrophic cardiomyopathy (HCM), low left ventricular ejection fraction (LVEF), and ST-elevation myocardial infarction (STEMI) with independent validation sets.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

MorphologyFM: A Foundation Model for Morphology-Aware Representation Learning from ECG and Pulse Oximetry Waveforms

Saiyang Feng, Yuanyu Zhang, Shi Li

Jul 2026

EFIB-Net: Information Bottleneck-Guided Multi-Resolution Attention Network for Robust ECG Denoising

Minghao Ma, Chen Liu, Yulin Mu, et al.

Applied Sciences · Jun 2026

WAVE: Wall-Aligned Vector Embedding for Self-Supervised Learning of Electrocardiograms

Shurong Pan, Wenhan Liu, Qingyuan Wu, et al.

Bioengineering · Jun 2026

HeartBEiT

#Key Features

#Technical Details

#Applications

#Impact

Citation

A foundational vision transformer improves diagnostic performance for electrocardiograms

Recent citations

MorphologyFM: A Foundation Model for Morphology-Aware Representation Learning from ECG and Pulse Oximetry Waveforms

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

HeartBEiT

#Key Features

#Technical Details

#Applications

#Impact

Citation

A foundational vision transformer improves diagnostic performance for electrocardiograms

Recent citations

MorphologyFM: A Foundation Model for Morphology-Aware Representation Learning from ECG and Pulse Oximetry Waveforms

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact