D-BETA

Singapore Management University / Eindhoven University of Technology

ECG foundation model pretrained on 12-lead waveforms paired with clinical reports, enabling label-efficient and zero-shot cardiac diagnosis.

Released: October 2024

D-BETA (Discriminative masked ECG-Text Auto-Encoder) is a self-supervised foundation model for the 12-lead electrocardiogram (ECG) that learns from raw waveforms paired with their free-text clinical reports. ECG interpretation is a cornerstone of cardiac care, but supervised deep-learning models typically require large, expensively labeled datasets and generalize poorly across the heterogeneous acquisition protocols and patient populations found in different hospitals. D-BETA addresses this by pretraining on signal-report pairs without manual diagnostic labels, producing transferable representations that perform well even when only a tiny fraction of labeled data is available downstream.

The model's central idea is to combine the complementary strengths of generative and discriminative self-supervision. Most prior ECG-text models rely on either masked reconstruction (generative) or cross-modal contrastive alignment (discriminative) alone. D-BETA unifies both in a contrastive masked auto-encoder, reconstructing masked ECG segments while simultaneously aligning ECG and text embeddings, and "boosts" the discriminative signal with a tailored negative-sampling strategy and dedicated loss functions for cross-modal matching.

D-BETA was developed by Hung Manh Pham and Dong Ma at Singapore Management University with Aaqib Saeed at Eindhoven University of Technology, and was accepted at ICML 2025 (first released as a preprint in October 2024).

Key Features

Contrastive masked auto-encoding: Jointly reconstructs masked ECG segments and contrastively aligns ECG and text, fusing generative and discriminative objectives in a single pretraining stage.
Label-free multimodal pretraining: Learns directly from ECG-report pairs without diagnostic labels, sidestepping the cost of large annotated cardiology datasets.
Boosted negative sampling: An improved cross-modal negative-sampling scheme and specialized matching losses sharpen the discriminative quality of the learned representations.
Strong label efficiency: Delivers roughly a 15% average AUC gain over prior state-of-the-art in linear probing when only 1% of labeled training data is used.
Zero-shot capability: Supports diagnosis of cardiac conditions with no task-specific fine-tuning, improving zero-shot AUC by about 2% over competing methods.

Technical Details

D-BETA pairs a transformer-based ECG encoder with a text encoder. The ECG branch uses eight transformer encoder layers with multi-head self-attention operating over 12-lead waveforms, while the text branch is built on a Flan-T5-base encoder that produces 768-dimensional embeddings; the model outputs 768-dimensional cross-modal features. Pretraining uses the MIMIC-IV-ECG v1.0 dataset, comprising roughly 800,035 ECG-report pairs from 161,352 unique subjects (about 779,891 samples after processing). The model is evaluated on five public benchmarks spanning diverse downstream tasks and populations — PhysioNet 2021, PTB-XL, CSN (Chapman-Shaoxing-Ningbo), CPSC2018, and CODE-test — where it reports an average AUC improvement of about 15% in 1%-data linear probing and about 2% in zero-shot evaluation relative to prior ECG-text models. Pretrained weights are released on Hugging Face and load via the transformers AutoModel API with trust_remote_code=True (license listed as "Other").

Applications

D-BETA targets ECG-based cardiac screening and diagnosis in settings where labeled data are scarce. Its label-efficient linear probing makes it well suited to adapting a single pretrained backbone to new arrhythmia or cardiac-condition classification tasks with minimal annotation, while its zero-shot mode allows practitioners to query for conditions described in natural language without any fine-tuning. Researchers can use it as a general-purpose ECG feature extractor for downstream clinical machine-learning pipelines, and its cross-modal design supports report-aware retrieval and analysis of paired signal-text records.

Impact

D-BETA contributes to a growing body of multimodal ECG foundation models that pair physiological signals with clinical text, demonstrating that unifying generative reconstruction with boosted contrastive learning yields markedly more label-efficient representations than either objective alone. Its ICML 2025 acceptance, public code, and openly released pretrained checkpoints lower the barrier for reproducible ECG representation-learning research. Key limitations to note are that pretraining draws on a single source (MIMIC-IV-ECG), which may constrain generalization to acquisition setups and populations underrepresented in that corpus, and that the released weights carry a non-standard "Other" license requiring users to verify usage terms.

Citation

Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners

Preprint

Hung, M. P., et al. (2024) Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners. International Conference on Machine Learning.

DOI: 10.48550/arXiv.2410.02131

Recent citations

Papers that recently cited this model.

Learning Cardiac Latent Representations in Vectorcardiogram Space
Bosong Huang, Panzhen Zhao, Zengxiang Li, et al.
May 2026
0
Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals
Phu X. Nguyen, Konstantinos Kontras, Wei Dai, et al.
May 2026
0
From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG
Lei Xu, F. Sohrab, Mehmet Yamaç, et al.
May 2026
0

Top citations

The most-cited papers that cite this model.

Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals
Xiao Gu, Wei Tang, Jinpei Han, et al.
Nature Machine Intelligence · Feb 2026
8
Interpretable multimodal zero shot ECG diagnosis via structured clinical knowledge alignment
Jialu Tang, Hung Manh Pham, Ignace De Lathauwer, et al.
npj Cardiovascular Health · Oct 2025
2Influential
Learning Cardiac Latent Representations in Vectorcardiogram Space
Bosong Huang, Panzhen Zhao, Zengxiang Li, et al.
May 2026
0
Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals
Phu X. Nguyen, Konstantinos Kontras, Wei Dai, et al.
May 2026
0
From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG
Lei Xu, F. Sohrab, Mehmet Yamaç, et al.
May 2026
0

Citations

Total Citations13

Influential1

References51

GitHub

Stars36

Forks4

Open Issues0

Contributors2

Last Push4mo ago

LanguageJupyter Notebook

HuggingFace

Downloads120

Likes5

Last Modified2mo ago

Pipelinefeature-extraction

Fields of citing research

Medicine100%
Computer Science92%
Engineering15%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

27Closed

Usability — can I run it?20

Reproducibility — can I retrain it?16

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Contrastive masked auto-encoding: Jointly reconstructs masked ECG segments and contrastively aligns ECG and text, fusing generative and discriminative objectives in a single pretraining stage.

Label-free multimodal pretraining: Learns directly from ECG-report pairs without diagnostic labels, sidestepping the cost of large annotated cardiology datasets.

Boosted negative sampling: An improved cross-modal negative-sampling scheme and specialized matching losses sharpen the discriminative quality of the learned representations.

Strong label efficiency: Delivers roughly a 15% average AUC gain over prior state-of-the-art in linear probing when only 1% of labeled training data is used.

Zero-shot capability: Supports diagnosis of cardiac conditions with no task-specific fine-tuning, improving zero-shot AUC by about 2% over competing methods.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Learning Cardiac Latent Representations in Vectorcardiogram Space

Bosong Huang, Panzhen Zhao, Zengxiang Li, et al.

May 2026

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Phu X. Nguyen, Konstantinos Kontras, Wei Dai, et al.

May 2026

From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG

Lei Xu, F. Sohrab, Mehmet Yamaç, et al.

May 2026

Top citations

The most-cited papers that cite this model.

Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals

Xiao Gu, Wei Tang, Jinpei Han, et al.

Nature Machine Intelligence · Feb 2026

Interpretable multimodal zero shot ECG diagnosis via structured clinical knowledge alignment

Jialu Tang, Hung Manh Pham, Ignace De Lathauwer, et al.

npj Cardiovascular Health · Oct 2025

2Influential

Learning Cardiac Latent Representations in Vectorcardiogram Space

Bosong Huang, Panzhen Zhao, Zengxiang Li, et al.

May 2026

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Phu X. Nguyen, Konstantinos Kontras, Wei Dai, et al.

May 2026

From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG

Lei Xu, F. Sohrab, Mehmet Yamaç, et al.

May 2026

D-BETA

#Key Features

#Technical Details

#Applications

#Impact

Citation

Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners

Recent citations

Learning Cardiac Latent Representations in Vectorcardiogram Space

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG

Top citations

Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals

Interpretable multimodal zero shot ECG diagnosis via structured clinical knowledge alignment

Learning Cardiac Latent Representations in Vectorcardiogram Space

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

D-BETA

#Key Features

#Technical Details

#Applications

#Impact

Citation

Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners

Recent citations

Learning Cardiac Latent Representations in Vectorcardiogram Space

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG

Top citations

Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals

Interpretable multimodal zero shot ECG diagnosis via structured clinical knowledge alignment

Learning Cardiac Latent Representations in Vectorcardiogram Space

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact