bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
Biosignals

ESI (ECG Semantic Integrator)

Rice University

A foundation model for 12-lead ECG that learns signal representations via multimodal contrastive pretraining against LLM-generated cardiological text.

Released: May 2024

ECG Semantic Integrator (ESI) is a foundation model for the electrocardiogram (ECG) that learns signal representations by aligning 12-lead waveforms with rich, machine-generated cardiological text. Most ECG self-supervised methods rely on signal-only objectives (masking or signal augmentations) that capture morphology but miss the clinical semantics a cardiologist would read off a trace. ESI addresses this gap by pairing each recording with detailed natural language descriptions and training the encoder to bring the two modalities into a shared embedding space.

The method has two parts. The Cardio Query Assistant (CQA) is a retrieval-augmented generation (RAG) pipeline that prompts a large language model to write per-recording descriptions, grounding the text in retrieved cardiological knowledge and conditioning on demographic and waveform-derived information so the captions reflect the specific signal rather than generic boilerplate. The ESI stage then pretrains a 1D ECG encoder against these captions using a combination of contrastive and captioning objectives.

ESI was developed by Han Yu, Peikun Guo, and Akane Sano in the Computational Wellbeing lab at Rice University, released as an arXiv preprint in May 2024 and published in Transactions on Machine Learning Research (TMLR) in 2024.

#Key Features

  • LLM-enhanced text supervision: The CQA pipeline uses retrieval-augmented generation to produce detailed, recording-specific cardiological descriptions, supplying semantic supervision that signal-only pretraining cannot.
  • Dual contrastive + captioning objective: Pretraining combines a contrastive loss that aligns ECG and text embeddings with a captioning loss that reconstructs the description, encouraging the encoder to retain clinically meaningful detail.
  • 1D ConvNeXt-V2 signal encoder: ECG waveforms are encoded with a 1D adaptation of ConvNeXt-V2 (atto through large variants), paired with a BioLinkBERT text encoder pretrained on biomedical literature.
  • Demographic and waveform grounding: Captions are conditioned on patient demographics and waveform-derived features, making the generated text specific to each recording instead of generic templates.

#Technical Details

ESI couples a 1D modified ConvNeXt-V2 ECG encoder with a BioLinkBERT text encoder (michiyasunaga/BioLinkBERT-base) and trains them jointly with contrastive and captioning losses. The CQA component builds the text corpus through retrieval-augmented generation over cardiological references, using demographic and waveform information to tailor each description. Pretraining was run on the MIMIC-IV-ECG database using AdamW with a 5-epoch warm-up and a step-decay schedule on 4 NVIDIA A100 GPUs. The authors report substantial improvements over strong baselines — supervised training, signal-only self-supervised methods, and prior multimodal ECG approaches — on the two downstream evaluations, arrhythmia detection and ECG-based subject identification.

#Applications

ESI targets researchers building ECG analysis systems who want representations that transfer across tasks with limited labeled data. The pretrained encoder can be fine-tuned or linearly probed for arrhythmia classification, and its embeddings support biometric subject identification from ECG. More broadly, the CQA-plus-contrastive recipe is a template for injecting clinical text knowledge into other biosignal encoders, benefiting groups that have abundant raw physiological recordings but sparse expert annotations.

#Impact

ESI demonstrates that LLM-generated, retrieval-grounded clinical text can serve as an effective supervisory signal for physiological time series, extending the text-image contrastive paradigm into the biosignal domain. The work is published in TMLR with code released under GPL-3.0. A practical caveat for adopters: pretraining depends on MIMIC-IV-ECG, which is credentialed access via PhysioNet, so reproducing the full pipeline requires the user's own data access even though the authors do share a pretrained checkpoint. This dependence on a single restricted-access corpus, and evaluation on two downstream tasks, are the main limitations to weigh when assessing generalization.

Citation

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Preprint

Yu, H., et al. (2024) ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text. Trans. Mach. Learn. Res..

DOI: 10.48550/arXiv.2405.19366

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations46
Influential4
References68

GitHub

Stars20
Forks5
Open Issues0
Contributors2
Last Push3mo ago
LanguagePython
LicenseGPL-3.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
63Partial
Usability — can I run it?82
Reproducibility — can I retrain it?39
Model Openness Framework
Class III
Open Model

Tags

arrhythmia_detectioncontrastive_learningconvnextecgfoundation_modelmultimodalrepresentation_learningsubject_identificationtransformer

Resources

GitHub RepositoryResearch PaperOfficial Website