Zuse Institute Berlin / Freie Universität Berlin
A Vision-Transformer Joint-Embedding Predictive Architecture self-supervised on 1M+ ECG records, improving 12-lead ECG classification on PTB-XL.
ECG-JEPA is a self-supervised learning framework for the electrocardiogram (ECG) that adapts the Joint-Embedding Predictive Architecture (JEPA) to cardiac time-series. Introduced by Kuba Weimann and Tim O. F. Conrad at the Zuse Institute Berlin (with Freie Universität Berlin) in an October 2024 preprint, it addresses a persistent bottleneck in computational cardiology: high-quality ECG labels are expensive and scarce, while raw recordings are abundant. By pre-training on more than one million unlabeled records, ECG-JEPA learns transferable representations that boost downstream diagnostic classification.
The central idea behind JEPA is to predict the latent representation of a masked target region from a visible context region, rather than reconstructing the raw signal or relying on hand-crafted augmentations. This places ECG-JEPA between two dominant self-supervised paradigms. Unlike generative masked-autoencoder methods, it predicts abstract features instead of reconstructing every sample, which avoids wasting capacity on noise and signal detail that is irrelevant for diagnosis. Unlike invariance-based contrastive methods, it does not require domain-specific augmentations whose physiological validity for ECG is uncertain.
The work demonstrates that a representation-prediction objective, originally developed for images, transfers effectively to multi-lead physiological signals and outperforms both invariance-based and generative alternatives on standard ECG benchmarks.
ECG-JEPA uses a Vision Transformer (ViT) backbone applied to multi-lead ECG, released in three sizes — ViT-XS, ViT-S, and ViT-B. The pretraining corpus combines ten public databases totaling over one million records, dominated by MIMIC-IV-ECG (~800,000 records) and CODE-15 (~128,000), and including Chapman-Shaoxing, CPSC and CPSC-Extra, Georgia, Ningbo, PTB, St-Petersburg, and the PTB-XL training partition. The JEPA objective trains context and target encoders so that a predictor maps the context embedding to the latent representation of masked target blocks. On the PTB-XL "all statements" multi-label benchmark, the ViT-S JEPA model reaches an AUC of 0.945 with fine-tuning and 0.938 under linear evaluation; on the superdiagnostic single-label task it reaches 0.935 (fine-tuned) and 0.928 (linear). Across settings the JEPA pretraining consistently surpasses invariance-based and generative self-supervised baselines. The public repository provides pretraining and evaluation code but does not currently distribute pretrained checkpoints, so users reproduce the encoders by running the released pretraining scripts.
ECG-JEPA targets automated interpretation of 12-lead ECGs, a core task in cardiology screening, triage, and large-scale clinical research. Because the pretrained encoder transfers well even under frozen linear evaluation, it is well suited to settings where labeled cardiac data are limited — smaller hospital cohorts, rare-condition detection, or new diagnostic label sets — letting teams adapt a strong representation with modest supervision. Researchers building ECG diagnostic models, biosignal foundation-model developers, and groups studying self-supervised learning for physiological time-series are the primary beneficiaries.
ECG-JEPA contributes evidence that joint-embedding predictive pretraining, rather than masked reconstruction or contrastive invariance, is a strong recipe for physiological signals, extending the JEPA family beyond vision into biosignals. Its demonstration that latent-feature prediction outperforms generative and invariance-based self-supervision on PTB-XL provides a useful design signal for the growing field of ECG and biosignal foundation models. The main practical limitation is that no pretrained weights are released, so adoption currently requires the compute to repeat large-scale pretraining; the open MIT-licensed code nonetheless makes the approach fully reproducible.
Weimann, K. & Conrad, T. O. F. (2024) Self-Supervised Pre-Training with Joint-Embedding Predictive Architecture Boosts ECG Classification Performance. Comput. Biol. Medicine.
DOI: 10.48550/arXiv.2410.13867Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data