ESI (ECG Semantic Integrator)

ECG foundation model that learns 12-lead waveform representations by contrastively aligning each recording with machine-generated cardiological text.

Released: May 2024

ECG Semantic Integrator (ESI) is a foundation model for the electrocardiogram (ECG) that learns signal representations by aligning 12-lead waveforms with rich, machine-generated cardiological text. Most ECG self-supervised methods rely on signal-only objectives (masking or signal augmentations) that capture morphology but miss the clinical semantics a cardiologist would read off a trace. ESI addresses this gap by pairing each recording with detailed natural language descriptions and training the encoder to bring the two modalities into a shared embedding space.

The method has two parts. The Cardio Query Assistant (CQA) is a retrieval-augmented generation (RAG) pipeline that prompts a large language model to write per-recording descriptions, grounding the text in retrieved cardiological knowledge and conditioning on demographic and waveform-derived information so the captions reflect the specific signal rather than generic boilerplate. The ESI stage then pretrains a 1D ECG encoder against these captions using a combination of contrastive and captioning objectives.

ESI was developed by Han Yu, Peikun Guo, and Akane Sano in the Computational Wellbeing lab at Rice University, released as an arXiv preprint in May 2024 and published in Transactions on Machine Learning Research (TMLR) in 2024.

Key Features

LLM-enhanced text supervision: The CQA pipeline uses retrieval-augmented generation to produce detailed, recording-specific cardiological descriptions, supplying semantic supervision that signal-only pretraining cannot.
Dual contrastive + captioning objective: Pretraining combines a contrastive loss that aligns ECG and text embeddings with a captioning loss that reconstructs the description, encouraging the encoder to retain clinically meaningful detail.
1D ConvNeXt-V2 signal encoder: ECG waveforms are encoded with a 1D adaptation of ConvNeXt-V2 (atto through large variants), paired with a BioLinkBERT text encoder pretrained on biomedical literature.
Demographic and waveform grounding: Captions are conditioned on patient demographics and waveform-derived features, making the generated text specific to each recording instead of generic templates.

Technical Details

ESI couples a 1D modified ConvNeXt-V2 ECG encoder with a BioLinkBERT text encoder (michiyasunaga/BioLinkBERT-base) and trains them jointly with contrastive and captioning losses. The CQA component builds the text corpus through retrieval-augmented generation over cardiological references, using demographic and waveform information to tailor each description. Pretraining was run on the MIMIC-IV-ECG database using AdamW with a 5-epoch warm-up and a step-decay schedule on 4 NVIDIA A100 GPUs. The authors report substantial improvements over strong baselines — supervised training, signal-only self-supervised methods, and prior multimodal ECG approaches — on the two downstream evaluations, arrhythmia detection and ECG-based subject identification.

Applications

ESI targets researchers building ECG analysis systems who want representations that transfer across tasks with limited labeled data. The pretrained encoder can be fine-tuned or linearly probed for arrhythmia classification, and its embeddings support biometric subject identification from ECG. More broadly, the CQA-plus-contrastive recipe is a template for injecting clinical text knowledge into other biosignal encoders, benefiting groups that have abundant raw physiological recordings but sparse expert annotations.

Impact

ESI demonstrates that LLM-generated, retrieval-grounded clinical text can serve as an effective supervisory signal for physiological time series, extending the text-image contrastive paradigm into the biosignal domain. The work is published in TMLR with code released under GPL-3.0. A practical caveat for adopters: pretraining depends on MIMIC-IV-ECG, which is credentialed access via PhysioNet, so reproducing the full pipeline requires the user's own data access even though the authors do share a pretrained checkpoint. This dependence on a single restricted-access corpus, and evaluation on two downstream tasks, are the main limitations to weigh when assessing generalization.

Citation

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Preprint

Yu, H., et al. (2024) ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text. Trans. Mach. Learn. Res..

DOI: 10.48550/arXiv.2405.19366

Recent citations

Papers that recently cited this model.

Artificial intelligence electrocardiography for left ventricular systolic dysfunction demonstrates preserved performance across demographic training imbalances
P. Hsieh, Parth Agrawal, Aman Alok, et al.
European Heart Journal - Digital Health · May 2026
0
Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals
Phu X. Nguyen, Konstantinos Kontras, Wei Dai, et al.
May 2026
0
Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons
Wei Tang, Jinpei Han, Kangning Cui, et al.
May 2026
0

Top citations

The most-cited papers that cite this model.

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis
Yubao Zhao, Jiaju Kang, Tian Zhang, et al.
IEEE International Conference on Multimedia and Expo · Aug 2024
44
A survey of transformers and large language models for ECG diagnosis: advances, challenges, and future directions
Mohammed Yusuf Ansari, Mohammed Yaqoob, Mohammed Ishaq, et al.
Artificial Intelligence Review · Jun 2025
35
An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple Domains
Jun Li, Aaron Aguirre, Junior Moura, et al.
arXiv.org · Oct 2024
32
Large Language Model Benchmarks in Medical Tasks
Lawrence K.Q. Yan, Ming Li, Yichao Zhang, et al.
arXiv.org · Oct 2024
28
How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook
Haoxin Liu, Harshavardhan Kamarthi, Zhiyuan Zhao, et al.
arXiv.org · Mar 2025
24

Citations

Total Citations48

Influential4

References68

GitHub

Stars20

Forks5

Open Issues0

Contributors2

Last Push5mo ago

LanguagePython

LicenseGPL-3.0

Fields of citing research

Computer Science100%
Medicine96%
Engineering26%
Physics2%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

63Partial

Usability — can I run it?82

Reproducibility — can I retrain it?39

Model Openness Framework

Class III

Open Model

Resources

GitHub Repository Research Paper Official Website

Key Features

LLM-enhanced text supervision: The CQA pipeline uses retrieval-augmented generation to produce detailed, recording-specific cardiological descriptions, supplying semantic supervision that signal-only pretraining cannot.

Dual contrastive + captioning objective: Pretraining combines a contrastive loss that aligns ECG and text embeddings with a captioning loss that reconstructs the description, encouraging the encoder to retain clinically meaningful detail.

1D ConvNeXt-V2 signal encoder: ECG waveforms are encoded with a 1D adaptation of ConvNeXt-V2 (atto through large variants), paired with a BioLinkBERT text encoder pretrained on biomedical literature.

Demographic and waveform grounding: Captions are conditioned on patient demographics and waveform-derived features, making the generated text specific to each recording instead of generic templates.

Technical Details

Applications

Impact

Citation

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Preprint

Yu, H., et al. (2024) ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text. Trans. Mach. Learn. Res..

DOI: 10.48550/arXiv.2405.19366

Recent citations

Papers that recently cited this model.

Artificial intelligence electrocardiography for left ventricular systolic dysfunction demonstrates preserved performance across demographic training imbalances

P. Hsieh, Parth Agrawal, Aman Alok, et al.

European Heart Journal - Digital Health · May 2026

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Phu X. Nguyen, Konstantinos Kontras, Wei Dai, et al.

May 2026

Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

Wei Tang, Jinpei Han, Kangning Cui, et al.

May 2026

ESI (ECG Semantic Integrator)

#Key Features

#Technical Details

#Applications

#Impact

Citation

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Recent citations

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

ESI (ECG Semantic Integrator)

#Key Features

#Technical Details

#Applications

#Impact

Citation

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Recent citations

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact