bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
Biosignals

ECGFM-KED

Shanghai Jiao Tong University

A knowledge-enhanced ECG foundation model that pairs a ResNet signal encoder with LLM-derived diagnostic knowledge for zero- and few-shot electrocardiogram interpretation.

Released: December 2024

ECGFM-KED is a knowledge-enhanced electrocardiogram (ECG) diagnosis foundation model that combines deep signal representation learning with domain knowledge distilled from large language models. Developed by Yuanyuan Tian and colleagues at Shanghai Jiao Tong University and published in Cell Reports Medicine in December 2024, the model addresses a core limitation of conventional ECG classifiers: they are trained against fixed label sets and generalize poorly to new diagnostic categories, recording conditions, and patient populations.

The model's central idea, abbreviated KED (Knowledge-Enhanced Diagnosis), is to align the latent space of an ECG signal encoder with text embeddings of clinically grounded disease descriptions. Rather than treating diagnoses as opaque class indices, ECGFM-KED uses a large language model to generate structured, ECG-specific knowledge for each condition (its characteristic morphology and rhythm features), and learns a shared representation in which a recording is matched to the textual description it most resembles. This contrastive signal-language framing enables open-set, zero-shot inference: the model can be queried with diagnoses it never saw during pretraining.

Positioned within the emerging wave of medical signal foundation models, ECGFM-KED is notable for demonstrating cardiologist-comparable performance and strong cross-regional generalization, having been validated on cohorts from China, the United States, and other regions across all age groups.

#Key Features

  • Knowledge-enhanced diagnosis (KED): An LLM generates ECG-specific descriptions of each diagnosis, and the model aligns signal embeddings with these text embeddings, injecting clinical priors instead of relying on label indices alone.
  • Zero-shot diagnosis: Because diagnoses are represented as text, the model classifies conditions absent from its training set, including diseases and rhythms it has never encountered.
  • Strong cross-region generalization: Validated on external multi-center datasets spanning different countries and age groups without retraining.
  • Few-shot fine-tuning: Supports configurable supervised adaptation at 1%, 10%, or 100% of labeled samples for institutions with limited annotations.
  • Clinician-comparable accuracy: Achieves performance comparable to three experienced cardiologists on real clinical data for seven common ECG types.
  • Open weights and code: Pretrained weights, annotated labels, and training code are publicly released.

#Technical Details

ECGFM-KED uses a ResNet-based convolutional encoder for 12-lead ECG signals, paired with a text encoder that embeds LLM-generated diagnostic knowledge; the two modalities are aligned through contrastive learning, following the signal-language paradigm popularized by vision-language models such as CLIP. The model was pretrained on roughly 800,000 ECGs from nearly 160,000 unique patients drawn from the MIMIC-IV-ECG database, with diagnosis annotations enriched by language-model-generated descriptions. Evaluation spans four widely used public benchmarks: PTB-XL, the Georgia 12-lead dataset, CPSC2018 (ICBEB), and the Shaoxing/Chapman dataset, covering morphological abnormalities, rhythm abnormalities, conduction blocks, hypertrophy, myocardial ischemia, and infarction. Across these external cohorts the model reports strong zero-shot AUROC and consistently improves over supervised baselines in low-label few-shot regimes. Pretrained weights (best_valid_all_increase_with_augment_epoch_3.pt, ~1.3 GB) and the annotated MIMIC-IV-ECG label file are distributed via Zenodo (record 14881564).

#Applications

ECGFM-KED targets clinical ECG interpretation, particularly settings where annotated data are scarce or where new diagnostic categories must be supported without collecting fresh labeled cohorts. Its zero-shot capability suits screening and triage across hospitals and regions with differing patient demographics and recording equipment, while few-shot fine-tuning lets a site adapt the model with as little as 1% of local labels. Because it produces text-aligned predictions, it also lends itself to explainable decision support, surfacing the knowledge descriptions that drove a given diagnosis. Researchers building ECG analysis pipelines can use the released weights as a general-purpose encoder for downstream cardiac tasks.

#Impact

ECGFM-KED demonstrates that coupling signal encoders with LLM-derived clinical knowledge produces ECG models that generalize across institutions and to unseen diagnoses, a meaningful step toward deployable foundation models for physiological signals. Its publication in Cell Reports Medicine, paired with open weights and code, makes it a practical reference point for the growing biosignals foundation-model community. One caveat for adopters concerns licensing: the GitHub repository declares no license, whereas the Zenodo deposit of weights and data is released under CC-BY-4.0. This discrepancy leaves the legal status of the source code ambiguous, so teams intending production use should seek clarification from the authors before relying on the codebase.

Citation

Foundation model of ECG diagnosis: Diagnostics and explanations of any form and rhythm on ECG

Tian, Y., et al. (2024) Foundation model of ECG diagnosis: Diagnostics and explanations of any form and rhythm on ECG. Cell Reports Medicine.

DOI: 10.1016/j.xcrm.2024.101875

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations41
Influential3
References47

GitHub

Stars42
Forks6
Open Issues11
Contributors1
Last Push1y ago
LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
30Closed
Usability — can I run it?41
Reproducibility — can I retrain it?20
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cardiologycnncontrastive_learningecg_diagnosiselectrocardiogramfew_shot_learningfoundation_modelmultimodalresnetzero_shot_classification

Resources

GitHub RepositoryResearch PaperDataset