AnyECG

Zhejiang University / University of Illinois Urbana-Champaign / Shanghai Jiao Tong University / Hong Kong University of Science and Technology

ECG foundation model that learns discrete rhythm tokens from noisy real-world recordings for arrhythmia classification and anomaly detection.

Released: November 2024

Parameters: 1.7 Billion

Electrocardiograms (ECGs) are the most widely collected cardiac biosignal, but real-world recordings are noisy, vary in length and lead configuration, and span a wide range of clinical tasks that have historically required bespoke models. AnyECG is a self-supervised ECG foundation model designed to learn transferable representations directly from heterogeneous, real-world ECG data, so that a single pretrained backbone can be adapted to many downstream cardiac problems rather than training a new network for each.

Introduced in November 2024 by researchers at Zhejiang University, the University of Illinois Urbana-Champaign, Shanghai Jiao Tong University, and the Hong Kong University of Science and Technology (Guangzhou), AnyECG reframes ECG modeling around discrete, clinically meaningful "rhythm codes." Its central idea is a tokenizer that converts continuous, noisy waveforms into a compact vocabulary of local rhythm tokens capturing morphological, frequency, and demographic structure, which a masked-modeling encoder then learns to predict from context.

The result is a backbone that generalizes across four otherwise distinct tasks evaluated in the paper: anomaly detection, arrhythmia classification, corrupted-lead generation (signal restoration), and ultra-long ECG recognition. The authors report an average improvement of roughly 6% over task-specific baselines across these settings.

Key Features

Two-stage self-supervised design: A Rhythm Quantizer is trained first to tokenize raw ECG into discrete codes, after which a Transformer encoder is pretrained by masked signal modeling to predict those code indices from surrounding context.
Clinically grounded tokenization: The quantizer is supervised by three reconstruction objectives—time-domain morphology, wavelet-based frequency content, and patient demographics—so each rhythm token encodes physiologically relevant structure rather than arbitrary signal fragments.
Cardio-Sparse Attention: A sparse attention pattern restricts interactions to patches within the same lead or the same temporal position, cutting the computational cost of long, multi-lead recordings and enabling ultra-long ECG analysis.
Multitask generalization: One pretrained backbone is fine-tuned across anomaly detection, arrhythmia classification, corrupted-lead generation, and ultra-long recognition without architecture changes.
Robustness to real-world noise: By learning from diverse public ECG sources, the model targets the variable quality and acquisition conditions typical of clinical practice rather than curated benchmark signals.

Technical Details

AnyECG is a Transformer-based model released in base, large, and extra-large configurations, with the largest variant (AnyECG-XL) reaching roughly 1.7B parameters. Stage one learns a vector-quantization codebook that maps ECG patches to discrete tokens using a combination of morphology, frequency, demography, codebook, and commitment losses; stage two masks random patches and trains the encoder to recover the missing code indices, with Cardio-Sparse Attention reducing the quadratic attention cost over long sequences. Pretraining draws on seven public ECG corpora—CPSC, CPSC-Extra, INCART, PTB, PTB-XL, the Georgia 12-lead challenge set, and an additional collection—totaling on the order of 50,000 recordings. On the four evaluation tasks, AnyECG-XL reports strong results including roughly 0.86 AUROC and 0.89 weighted F1 for anomaly detection and about 0.91 AUROC for ultra-long ECG analysis, while AnyECG-L achieves 32.74 dB PSNR and 0.874 SSIM on corrupted-lead reconstruction.

Applications

AnyECG is intended as a reusable backbone for cardiac signal analysis in research and, prospectively, clinical decision support. Its discrete tokenization and sparse attention make it well suited to settings where recordings are long, noisy, or inconsistently leaded—such as ambulatory and Holter monitoring, automated triage of 12-lead ECGs, and quality-control pipelines that must reconstruct corrupted leads. Because a single pretrained model transfers across anomaly detection, arrhythmia classification, restoration, and ultra-long recognition, it can reduce the per-task labeling and engineering burden for groups building ECG analysis tools.

Impact

AnyECG contributes to the growing effort to bring foundation-model methodology to physiological biosignals, where ECG is a natural early target given its scale and clinical importance. Its rhythm-tokenizer-plus-masked-modeling recipe and Cardio-Sparse Attention offer a concrete template for handling the length and noise of real-world cardiac data. Adoption is currently limited by openness: the companion repository (PKUDigitalHealth/AnyECG-Lab) is sparse, carries no declared license, and does not provide a confirmed pretrained-weight download, so independent reproduction and reuse of the released checkpoints are not yet straightforward.

Citation

AnyECG: Foundational Models for Multitask Cardiac Analysis in Real-World Settings

Preprint

Wang, Y., et al. (2024) AnyECG: Foundational Models for Multitask Cardiac Analysis in Real-World Settings.

DOI: 10.48550/arXiv.2411.17711

Recent citations

Papers that recently cited this model.

SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model
Zongheng Guo, Tao Chen, Yang Jiao, et al.
arXiv.org · Jan 2026
1
FoundationalECGNet: A Lightweight Foundational Model for ECG-based Multitask Cardiac Analysis
M. Sk., Md Jobayer, M. M. H. Shawon, et al.
arXiv.org · Sep 2025
1
End-to-End Platform for Electrocardiogram Analysis and Model Fine-Tuning: Development and Validation Study
Lucas Bickmann, L. Plagwitz, Antonius Büscher, et al.
Journal of Medical Internet Research · Mar 2025
1

Top citations

The most-cited papers that cite this model.

OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records
Zhijiang Wan, Qianhao Yu, J. Mao, et al.
arXiv.org · Mar 2025
9
A Multi-Scale Deep Learning Framework Combining MobileViT-ECA and LSTM for Accurate ECG Analysis
Abduljabbar S. Ba Mahel, Mehdhar S. A. M. Al-Gaashani, R. Alkanhel, et al.
IEEE Access · 2025
9
SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model
Zongheng Guo, Tao Chen, Yang Jiao, et al.
arXiv.org · Jan 2026
1
FoundationalECGNet: A Lightweight Foundational Model for ECG-based Multitask Cardiac Analysis
M. Sk., Md Jobayer, M. M. H. Shawon, et al.
arXiv.org · Sep 2025
1
End-to-End Platform for Electrocardiogram Analysis and Model Fine-Tuning: Development and Validation Study
Lucas Bickmann, L. Plagwitz, Antonius Büscher, et al.
Journal of Medical Internet Research · Mar 2025
1

Citations

Total Citations5

Influential0

References50

GitHub

Stars4

Forks1

Open Issues0

Contributors2

Last Push9mo ago

Fields of citing research

Computer Science100%
Medicine100%
Engineering40%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

10Closed

Usability — can I run it?7

Reproducibility — can I retrain it?13

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website

Key Features

Two-stage self-supervised design: A Rhythm Quantizer is trained first to tokenize raw ECG into discrete codes, after which a Transformer encoder is pretrained by masked signal modeling to predict those code indices from surrounding context.

Clinically grounded tokenization: The quantizer is supervised by three reconstruction objectives—time-domain morphology, wavelet-based frequency content, and patient demographics—so each rhythm token encodes physiologically relevant structure rather than arbitrary signal fragments.

Cardio-Sparse Attention: A sparse attention pattern restricts interactions to patches within the same lead or the same temporal position, cutting the computational cost of long, multi-lead recordings and enabling ultra-long ECG analysis.

Multitask generalization: One pretrained backbone is fine-tuned across anomaly detection, arrhythmia classification, corrupted-lead generation, and ultra-long recognition without architecture changes.

Robustness to real-world noise: By learning from diverse public ECG sources, the model targets the variable quality and acquisition conditions typical of clinical practice rather than curated benchmark signals.

Technical Details

Applications

Impact

AnyECG

#Key Features

#Technical Details

#Applications

#Impact

Citation

AnyECG: Foundational Models for Multitask Cardiac Analysis in Real-World Settings

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

AnyECG

#Key Features

#Technical Details

#Applications

#Impact

Citation

AnyECG: Foundational Models for Multitask Cardiac Analysis in Real-World Settings

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact