bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
Biosignals

AnyECG

Zhejiang University / University of Illinois Urbana-Champaign / Shanghai Jiao Tong University / Hong Kong University of Science and Technology

A two-stage pretrained ECG foundation model that learns discrete rhythm tokens for robust multitask cardiac analysis on noisy, real-world electrocardiograms.

Released: November 2024
Parameters: 1.7 Billion

Electrocardiograms (ECGs) are the most widely collected cardiac biosignal, but real-world recordings are noisy, vary in length and lead configuration, and span a wide range of clinical tasks that have historically required bespoke models. AnyECG is a self-supervised ECG foundation model designed to learn transferable representations directly from heterogeneous, real-world ECG data, so that a single pretrained backbone can be adapted to many downstream cardiac problems rather than training a new network for each.

Introduced in November 2024 by researchers at Zhejiang University, the University of Illinois Urbana-Champaign, Shanghai Jiao Tong University, and the Hong Kong University of Science and Technology (Guangzhou), AnyECG reframes ECG modeling around discrete, clinically meaningful "rhythm codes." Its central idea is a tokenizer that converts continuous, noisy waveforms into a compact vocabulary of local rhythm tokens capturing morphological, frequency, and demographic structure, which a masked-modeling encoder then learns to predict from context.

The result is a backbone that generalizes across four otherwise distinct tasks evaluated in the paper: anomaly detection, arrhythmia classification, corrupted-lead generation (signal restoration), and ultra-long ECG recognition. The authors report an average improvement of roughly 6% over task-specific baselines across these settings.

#Key Features

  • Two-stage self-supervised design: A Rhythm Quantizer is trained first to tokenize raw ECG into discrete codes, after which a Transformer encoder is pretrained by masked signal modeling to predict those code indices from surrounding context.
  • Clinically grounded tokenization: The quantizer is supervised by three reconstruction objectives—time-domain morphology, wavelet-based frequency content, and patient demographics—so each rhythm token encodes physiologically relevant structure rather than arbitrary signal fragments.
  • Cardio-Sparse Attention: A sparse attention pattern restricts interactions to patches within the same lead or the same temporal position, cutting the computational cost of long, multi-lead recordings and enabling ultra-long ECG analysis.
  • Multitask generalization: One pretrained backbone is fine-tuned across anomaly detection, arrhythmia classification, corrupted-lead generation, and ultra-long recognition without architecture changes.
  • Robustness to real-world noise: By learning from diverse public ECG sources, the model targets the variable quality and acquisition conditions typical of clinical practice rather than curated benchmark signals.

#Technical Details

AnyECG is a Transformer-based model released in base, large, and extra-large configurations, with the largest variant (AnyECG-XL) reaching roughly 1.7B parameters. Stage one learns a vector-quantization codebook that maps ECG patches to discrete tokens using a combination of morphology, frequency, demography, codebook, and commitment losses; stage two masks random patches and trains the encoder to recover the missing code indices, with Cardio-Sparse Attention reducing the quadratic attention cost over long sequences. Pretraining draws on seven public ECG corpora—CPSC, CPSC-Extra, INCART, PTB, PTB-XL, the Georgia 12-lead challenge set, and an additional collection—totaling on the order of 50,000 recordings. On the four evaluation tasks, AnyECG-XL reports strong results including roughly 0.86 AUROC and 0.89 weighted F1 for anomaly detection and about 0.91 AUROC for ultra-long ECG analysis, while AnyECG-L achieves 32.74 dB PSNR and 0.874 SSIM on corrupted-lead reconstruction.

#Applications

AnyECG is intended as a reusable backbone for cardiac signal analysis in research and, prospectively, clinical decision support. Its discrete tokenization and sparse attention make it well suited to settings where recordings are long, noisy, or inconsistently leaded—such as ambulatory and Holter monitoring, automated triage of 12-lead ECGs, and quality-control pipelines that must reconstruct corrupted leads. Because a single pretrained model transfers across anomaly detection, arrhythmia classification, restoration, and ultra-long recognition, it can reduce the per-task labeling and engineering burden for groups building ECG analysis tools.

#Impact

AnyECG contributes to the growing effort to bring foundation-model methodology to physiological biosignals, where ECG is a natural early target given its scale and clinical importance. Its rhythm-tokenizer-plus-masked-modeling recipe and Cardio-Sparse Attention offer a concrete template for handling the length and noise of real-world cardiac data. Adoption is currently limited by openness: the companion repository (PKUDigitalHealth/AnyECG-Lab) is sparse, carries no declared license, and does not provide a confirmed pretrained-weight download, so independent reproduction and reuse of the released checkpoints are not yet straightforward.

Citation

AnyECG: Foundational Models for Multitask Cardiac Analysis in Real-World Settings

Preprint

Wang, Y., et al. (2024) AnyECG: Foundational Models for Multitask Cardiac Analysis in Real-World Settings.

DOI: 10.48550/arXiv.2411.17711

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations5
Influential0
References50

GitHub

Stars4
Forks1
Open Issues0
Contributors2
Last Push7mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
10Closed
Usability — can I run it?7
Reproducibility — can I retrain it?13
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

anomaly_detectionarrhythmia_classificationelectrocardiogramfoundation_modelself_supervisedsignal_restorationsparse_attentiontransformervector_quantization

Resources

GitHub RepositoryResearch PaperOfficial Website