bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
BiosignalsLanguage model

D-BETA

Singapore Management University / Eindhoven University of Technology

Contrastive masked ECG-text auto-encoder pretrained on paired electrocardiograms and clinical reports, enabling label-efficient and zero-shot cardiac diagnosis.

Released: October 2024

D-BETA (Discriminative masked ECG-Text Auto-Encoder) is a self-supervised foundation model for the 12-lead electrocardiogram (ECG) that learns from raw waveforms paired with their free-text clinical reports. ECG interpretation is a cornerstone of cardiac care, but supervised deep-learning models typically require large, expensively labeled datasets and generalize poorly across the heterogeneous acquisition protocols and patient populations found in different hospitals. D-BETA addresses this by pretraining on signal-report pairs without manual diagnostic labels, producing transferable representations that perform well even when only a tiny fraction of labeled data is available downstream.

The model's central idea is to combine the complementary strengths of generative and discriminative self-supervision. Most prior ECG-text models rely on either masked reconstruction (generative) or cross-modal contrastive alignment (discriminative) alone. D-BETA unifies both in a contrastive masked auto-encoder, reconstructing masked ECG segments while simultaneously aligning ECG and text embeddings, and "boosts" the discriminative signal with a tailored negative-sampling strategy and dedicated loss functions for cross-modal matching.

D-BETA was developed by Hung Manh Pham and Dong Ma at Singapore Management University with Aaqib Saeed at Eindhoven University of Technology, and was accepted at ICML 2025 (first released as a preprint in October 2024).

#Key Features

  • Contrastive masked auto-encoding: Jointly reconstructs masked ECG segments and contrastively aligns ECG and text, fusing generative and discriminative objectives in a single pretraining stage.
  • Label-free multimodal pretraining: Learns directly from ECG-report pairs without diagnostic labels, sidestepping the cost of large annotated cardiology datasets.
  • Boosted negative sampling: An improved cross-modal negative-sampling scheme and specialized matching losses sharpen the discriminative quality of the learned representations.
  • Strong label efficiency: Delivers roughly a 15% average AUC gain over prior state-of-the-art in linear probing when only 1% of labeled training data is used.
  • Zero-shot capability: Supports diagnosis of cardiac conditions with no task-specific fine-tuning, improving zero-shot AUC by about 2% over competing methods.

#Technical Details

D-BETA pairs a transformer-based ECG encoder with a text encoder. The ECG branch uses eight transformer encoder layers with multi-head self-attention operating over 12-lead waveforms, while the text branch is built on a Flan-T5-base encoder that produces 768-dimensional embeddings; the model outputs 768-dimensional cross-modal features. Pretraining uses the MIMIC-IV-ECG v1.0 dataset, comprising roughly 800,035 ECG-report pairs from 161,352 unique subjects (about 779,891 samples after processing). The model is evaluated on five public benchmarks spanning diverse downstream tasks and populations — PhysioNet 2021, PTB-XL, CSN (Chapman-Shaoxing-Ningbo), CPSC2018, and CODE-test — where it reports an average AUC improvement of about 15% in 1%-data linear probing and about 2% in zero-shot evaluation relative to prior ECG-text models. Pretrained weights are released on Hugging Face and load via the transformers AutoModel API with trust_remote_code=True (license listed as "Other").

#Applications

D-BETA targets ECG-based cardiac screening and diagnosis in settings where labeled data are scarce. Its label-efficient linear probing makes it well suited to adapting a single pretrained backbone to new arrhythmia or cardiac-condition classification tasks with minimal annotation, while its zero-shot mode allows practitioners to query for conditions described in natural language without any fine-tuning. Researchers can use it as a general-purpose ECG feature extractor for downstream clinical machine-learning pipelines, and its cross-modal design supports report-aware retrieval and analysis of paired signal-text records.

#Impact

D-BETA contributes to a growing body of multimodal ECG foundation models that pair physiological signals with clinical text, demonstrating that unifying generative reconstruction with boosted contrastive learning yields markedly more label-efficient representations than either objective alone. Its ICML 2025 acceptance, public code, and openly released pretrained checkpoints lower the barrier for reproducible ECG representation-learning research. Key limitations to note are that pretraining draws on a single source (MIMIC-IV-ECG), which may constrain generalization to acquisition setups and populations underrepresented in that corpus, and that the released weights carry a non-standard "Other" license requiring users to verify usage terms.

Citation

Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners

Preprint

Hung, M. P., et al. (2024) Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners. International Conference on Machine Learning.

DOI: 10.48550/arXiv.2410.02131

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations13
Influential1
References51

GitHub

Stars33
Forks2
Open Issues0
Contributors2
Last Push3mo ago
LanguageJupyter Notebook

HuggingFace

Downloads28
Likes5
Last Modified19d ago
Pipelinefeature-extraction

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
27Closed
Usability — can I run it?20
Reproducibility — can I retrain it?16
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

autoencodercontrastive_learningecg_classificationelectrocardiogrammultimodalrepresentation_learningself_supervisedtransformerzero_shot_diagnosis

Resources

GitHub RepositoryResearch PaperHuggingFace Model