HeartLang

ECG foundation model that treats heartbeats as words and rhythm strips as sentences, using heartbeat-level tokenization for diagnostic classification.

Released: February 2025

HeartLang is a self-supervised foundation model for the electrocardiogram (ECG) that reframes signal modeling as a language-modeling problem. Rather than carving the waveform into fixed-length time windows—the dominant practice in deep ECG models—it treats individual heartbeats as "words" and the sequence of beats that forms a rhythm strip as a "sentence." This semantic segmentation is designed to respect the natural structure of cardiac signals, where the clinically meaningful unit is the heartbeat and its morphology, not an arbitrary slice of time.

The model was introduced in "Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model" by Jiarui Jin, Haoyu Wang, Hongyan Li, Jun Li, Jiahui Pan, and Shenda Hong from the PKUDigitalHealth group at Peking University, and accepted to ICLR 2025. It sits within the broader wave of biosignal foundation models that adapt masked-prediction pretraining—popularized by language and vision models—to physiological time series, and is distinguished by its explicitly linguistic, heartbeat-centric tokenization.

By pretraining on a large corpus of unlabeled ECGs and transferring to downstream diagnostic tasks, HeartLang aims to reduce the heavy annotation burden that has limited supervised ECG deep learning, while producing representations that capture both single-beat form and longer-range rhythm context.

Key Features

Heartbeat-as-word tokenization: A QRS-Tokenizer detects QRS complexes and segments the raw signal into individual heartbeats, converting a continuous waveform into a sequence of semantically meaningful, beat-aligned tokens ("ECG sentences").
Dual-level representation learning: Pretraining operates at two levels—the form level captures the morphology of individual heartbeats, while the rhythm level captures how beats are arranged over time—mirroring the word/sentence distinction.
Vector-quantized heartbeat vocabulary: A Vector-Quantized Heartbeat Reconstruction (VQ-HBR) stage learns a discrete codebook of 8192 entries, building what the authors describe as the largest heartbeat-based ECG vocabulary to date.
Masked ECG sentence pretraining: Rhythm-level representations are learned through a masked-prediction objective over sequences of heartbeat tokens, analogous to masked language modeling.
Self-supervised and transferable: The model is pretrained without diagnostic labels and fine-tuned for downstream classification, lowering reliance on scarce expert annotations.

Technical Details

HeartLang uses a transformer backbone trained in two stages. First, the VQ-HBR module encodes each tokenized heartbeat into a discrete code drawn from an 8192-entry codebook, establishing the ECG "vocabulary"; this stage is reconstruction-based and vector quantized. Second, a masked ECG sentence pretraining stage learns rhythm-level representations by masking and predicting heartbeat tokens across sequences. Pretraining uses the MIMIC-IV-ECG dataset from PhysioNet, a large collection of 12-lead clinical recordings, run for roughly 200 epochs with learning-rate scheduling; the reference implementation trains VQ-HBR on 8 NVIDIA RTX 4090 GPUs. The model is evaluated across six public ECG datasets, including diagnostic subsets of PTB-XL, CPSC2018, and the Chapman-Shaoxing-Ningbo (CSN) arrhythmia dataset, where the authors report improved downstream classification over prior self-supervised ECG baselines. Code and pretrained checkpoints (the pretraining and VQ-HBR weights) are released under an MIT license.

Applications

HeartLang targets automated ECG interpretation tasks such as multi-label diagnostic classification and arrhythmia detection. Because it is pretrained self-supervised on unlabeled recordings, it is particularly useful in settings where labeled ECGs are limited: a hospital or research group can fine-tune the released checkpoints on a modest annotated dataset rather than training from scratch. Beyond classification, its heartbeat-level discrete vocabulary and learned embeddings can serve as reusable features for downstream cardiac analysis, and the framework offers a template for applying language-model-style pretraining to other quasi-periodic physiological signals.

Impact

By recasting ECG modeling as learning "words" and "sentences," HeartLang contributes a distinctive, biologically motivated tokenization strategy to the rapidly growing space of biosignal foundation models, and its acceptance at ICLR 2025 reflects interest in structure-aware self-supervised approaches. The public release of code, an 8192-entry heartbeat codebook, and pretrained weights lowers the barrier for downstream ECG research. Key limitations include dependence on accurate QRS detection for tokenization—noisy or abnormal beats may be mis-segmented—and evaluation centered on standard public benchmarks, so prospective clinical validation and robustness across diverse populations and devices remain open questions.

Citation

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Preprint

Jin, J., et al. (2025) Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model. International Conference on Learning Representations.

DOI: 10.48550/arXiv.2502.10707

Recent citations

Papers that recently cited this model.

Contactless Arrhythmia Detection via Diversity-Invariant Contrastive mmWave Sensing
Xinmeng Cai, Jinbo Chen, Haoyu Wang, et al.
IEEE Transactions on Mobile Computing · Aug 2026
0
SPOTR: Spatio-temporal Pooling One-Token Reconstruction for Universal Physiological Signal Self-supervised Learning
Yiyu Gui, Mingzhi Chen, Yuesheng Zhu, et al.
Jun 2026
0
RF-HeartSSL: Self-Supervised Learning for RF-Based Cardiac Sensing
Xinmeng Cai, Jinbo Chen, Guixin Xu, et al.
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies · Jun 2026
0

Top citations

The most-cited papers that cite this model.

An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple Domains
Jun Li, Aaron Aguirre, Junior Moura, et al.
arXiv.org · Oct 2024
32
From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining
Fuying Wang, Jiacheng Xu, Lequan Yu
International Conference on Machine Learning · Jun 2025
16Influential
CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model
Jingying Ma, Feng Wu, Qika Lin, et al.
arXiv.org · Jun 2025
13
Enhancing automatic multilabel diagnosis of electrocardiogram signals: A masked transformer approach
Ya Zhou, Xiaolin Diao, Yanni Huo, et al.
Comput. Biol. Medicine · Jul 2025
10
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Xi Fu, Weibang Jiang, Yi Ding, et al.
arXiv.org · Apr 2025
8

Citations

Total Citations46

Influential9

References37

GitHub

Stars56

Forks5

Open Issues0

Contributors1

Last Push1y ago

LanguagePython

LicenseMIT

HuggingFace

Downloads0

Likes4

Last Modified1y ago

Fields of citing research

Computer Science96%
Medicine82%
Engineering38%
Mathematics2%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

78Open

Usability — can I run it?94

Reproducibility — can I retrain it?66

Model Openness Framework

Class III

Open Model

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Heartbeat-as-word tokenization: A QRS-Tokenizer detects QRS complexes and segments the raw signal into individual heartbeats, converting a continuous waveform into a sequence of semantically meaningful, beat-aligned tokens ("ECG sentences").

Dual-level representation learning: Pretraining operates at two levels—the form level captures the morphology of individual heartbeats, while the rhythm level captures how beats are arranged over time—mirroring the word/sentence distinction.

Vector-quantized heartbeat vocabulary: A Vector-Quantized Heartbeat Reconstruction (VQ-HBR) stage learns a discrete codebook of 8192 entries, building what the authors describe as the largest heartbeat-based ECG vocabulary to date.

Masked ECG sentence pretraining: Rhythm-level representations are learned through a masked-prediction objective over sequences of heartbeat tokens, analogous to masked language modeling.

Self-supervised and transferable: The model is pretrained without diagnostic labels and fine-tuned for downstream classification, lowering reliance on scarce expert annotations.

Technical Details

Applications

Impact

Citation

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Preprint

Jin, J., et al. (2025) Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model. International Conference on Learning Representations.

DOI: 10.48550/arXiv.2502.10707

Recent citations

Papers that recently cited this model.

Contactless Arrhythmia Detection via Diversity-Invariant Contrastive mmWave Sensing

Xinmeng Cai, Jinbo Chen, Haoyu Wang, et al.

IEEE Transactions on Mobile Computing · Aug 2026

SPOTR: Spatio-temporal Pooling One-Token Reconstruction for Universal Physiological Signal Self-supervised Learning

Yiyu Gui, Mingzhi Chen, Yuesheng Zhu, et al.

Jun 2026

RF-HeartSSL: Self-Supervised Learning for RF-Based Cardiac Sensing

Xinmeng Cai, Jinbo Chen, Guixin Xu, et al.

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies · Jun 2026

HeartLang

#Key Features

#Technical Details

#Applications

#Impact

Citation

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Recent citations

SPOTR: Spatio-temporal Pooling One-Token Reconstruction for Universal Physiological Signal Self-supervised Learning

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

HeartLang

#Key Features

#Technical Details

#Applications

#Impact

Citation

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Recent citations

SPOTR: Spatio-temporal Pooling One-Token Reconstruction for Universal Physiological Signal Self-supervised Learning

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact