ECG-Chat

China University of Geosciences / Beijing Normal University / Southern University of Science and Technology

Multimodal ECG-language model aligning 12-lead waveforms with clinical report text for conversational cardiac diagnosis and report generation.

Released: August 2024

ECG-Chat is a multimodal large language model built to bring conversational AI to the electrocardiogram (ECG), the most common cardiac diagnostic test. While general-purpose multimodal LLMs handle natural images and text well, they struggle with the time-series structure of physiological waveforms and with the specialized vocabulary of cardiology. ECG-Chat addresses this gap by jointly modeling raw 12-lead ECG signals and clinical report text, enabling multi-turn dialogue about a patient's recording, automated report drafting, and cardiac disease classification within a single system.

The model was introduced by Yubao Zhao and colleagues at China University of Geosciences, with collaborators at Beijing Normal University, Southern University of Science and Technology, ESIGELEC, and the University of Liverpool. The work was first released as a preprint in August 2024 and subsequently accepted to the 2025 IEEE International Conference on Multimedia and Expo (ICME).

ECG-Chat combines two ideas that have driven recent progress in medical AI: contrastive signal-text pretraining (in the spirit of CLIP) to learn aligned ECG and report representations, and a LLaVA-style instruction-tuned LLM that grounds a language model in those learned signal features. The result is a system that can both retrieve relevant reports in a zero-shot setting and generate free-text diagnostic narratives.

Key Features

ECG-CoCa contrastive encoder: A waveform-text encoder built on the OpenCLIP/CoCa framework aligns 12-lead ECG signals with their clinical reports, producing embeddings that support zero-shot classification and retrieval without task-specific fine-tuning.
Conversational diagnosis: A LLaVA-based backbone connects the ECG encoder to a Vicuna-13B language model, allowing multi-turn question answering about a recording rather than a single fixed prediction.
Automated report generation: The model drafts structured ECG analysis reports and can render them through an automated LaTeX pipeline.
Instruction-tuned on curated dialogue: Training uses an ECG-Instruct corpus of a 19K diagnosis dataset and a 25K multi-turn dialogue dataset, constructed with GPT-4o, to teach clinically grounded conversational behavior.
LoRA fine-tuning: Low-rank adaptation mitigates catastrophic forgetting in the language backbone while specializing it for cardiac tasks.

Technical Details

ECG-Chat has two stages. First, ECG-CoCa is contrastively pretrained on paired ECG-report data drawn from five public datasets: MIMIC-IV-ECG, the Chapman-Shaoxing-Ningbo collection, Shandong Provincial Hospital (SPH), PTB-XL, and CPSC2018, with augmentation via the torch_ecg library. Second, the learned ECG features are fed, LLaVA-style, into a Vicuna-13B LLM that is instruction-tuned (with LoRA) on the GPT-4o-curated ECG-Instruct corpus. On zero-shot ECG-report retrieval over the PTB-XL test set (2K samples), the CoCa encoder with waveform-driven embedding reaches Recall@1 of 64.7% (ECG-to-report) and 71.6% (report-to-ECG). For report generation it reports BLEU-4 of 11.19 and ROUGE-L of 29.93, with disease and rhythm F1 of 22.33 and 43.39. Zero-shot classification F1 reaches 52.8% on PTB-XL and 80.1% on CPSC2018.

Applications

ECG-Chat targets clinical and research workflows where ECG interpretation is a bottleneck. Cardiologists and general clinicians could use it as a drafting assistant that generates a first-pass report and answers follow-up questions about rhythm, conduction, and disease findings, while researchers can use the ECG-CoCa embeddings for zero-shot screening, cohort retrieval, and transfer learning across ECG datasets. Its conversational interface also makes it a candidate for medical education and for triage settings where a structured, explainable summary is more useful than a single label.

Impact

ECG-Chat is part of a wave of waveform-language foundation models extending the vision-language recipe to physiological signals, and it demonstrates that contrastive signal-text pretraining plus instruction tuning can support both retrieval and generative diagnosis from raw ECGs. Its acceptance at ICME 2025 and open release of code and pretrained checkpoints lower the barrier for follow-up work. Important caveats remain: the model is a research prototype validated on retrospective public benchmarks rather than in prospective clinical trials, report-generation BLEU/ROUGE scores indicate meaningful room for improvement, and the released weights are distributed via Google Drive without a stated software license, which may limit reuse.

Citations

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Zhao, Y., et al. (2024) ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis. IEEE International Conference on Multimedia and Expo.

DOI: 10.1109/ICME59968.2025.11209476

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Preprint

Zhao, Y., et al. (2024) ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis. IEEE International Conference on Multimedia and Expo.

DOI: 10.48550/arXiv.2408.08849

Recent citations

Papers that recently cited this model.

An Edge–Cloud Collaborative ECG-Assisted Diagnostic System Leveraging Cross-Lead Knowledge Distillation and Large Language Models
Haohan Su, Shuai Wang, Hongxiao Wang, et al.
Italian National Conference on Sensors · Jun 2026
0
EVL-ECG: Efficient ECG Interpretation With Multi-Aspect Heterogeneous Knowledge Distillation
Dan Hong, Nhi Ngoc-Yen Nguyen, Huy-Hieu Pham
May 2026
0
Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals
Phu X. Nguyen, Konstantinos Kontras, Wei Dai, et al.
May 2026
0

Top citations

The most-cited papers that cite this model.

SensorLM: Learning the Language of Wearable Sensors
Yuwei Zhang, Kumar Ayush, Siyuan Qiao, et al.
arXiv.org · Jun 2025
47
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images
Xiang Lan, Feng Wu, Kai He, et al.
arXiv.org · Mar 2025
37
How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook
Haoxin Liu, Harshavardhan Kamarthi, Zhiyuan Zhao, et al.
arXiv.org · Mar 2025
24
Artificial Intelligence and ECG: A New Frontier in Cardiac Diagnostics and Prevention
D. Bartusik-Aebisher, Kacper Rogóż, D. Aebisher
Biomedicines · Jul 2025
23
From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining
Fuying Wang, Jiacheng Xu, Lequan Yu
International Conference on Machine Learning · Jun 2025
16

Citations

Total Citations48

Influential6

References76

GitHub

Stars82

Forks9

Open Issues6

Contributors1

Last Push8mo ago

LanguagePython

Fields of citing research

Computer Science100%
Medicine89%
Engineering31%
Biology4%
Physics2%
Linguistics2%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

27Closed

Usability — can I run it?23

Reproducibility — can I retrain it?17

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

ECG-CoCa contrastive encoder: A waveform-text encoder built on the OpenCLIP/CoCa framework aligns 12-lead ECG signals with their clinical reports, producing embeddings that support zero-shot classification and retrieval without task-specific fine-tuning.

Conversational diagnosis: A LLaVA-based backbone connects the ECG encoder to a Vicuna-13B language model, allowing multi-turn question answering about a recording rather than a single fixed prediction.

Automated report generation: The model drafts structured ECG analysis reports and can render them through an automated LaTeX pipeline.

Instruction-tuned on curated dialogue: Training uses an ECG-Instruct corpus of a 19K diagnosis dataset and a 25K multi-turn dialogue dataset, constructed with GPT-4o, to teach clinically grounded conversational behavior.

LoRA fine-tuning: Low-rank adaptation mitigates catastrophic forgetting in the language backbone while specializing it for cardiac tasks.

Technical Details

Applications

Impact

Citations

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Zhao, Y., et al. (2024) ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis. IEEE International Conference on Multimedia and Expo.

DOI: 10.1109/ICME59968.2025.11209476

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Preprint

Zhao, Y., et al. (2024) ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis. IEEE International Conference on Multimedia and Expo.

DOI: 10.48550/arXiv.2408.08849

Recent citations

Papers that recently cited this model.

An Edge–Cloud Collaborative ECG-Assisted Diagnostic System Leveraging Cross-Lead Knowledge Distillation and Large Language Models

Haohan Su, Shuai Wang, Hongxiao Wang, et al.

Italian National Conference on Sensors · Jun 2026

EVL-ECG: Efficient ECG Interpretation With Multi-Aspect Heterogeneous Knowledge Distillation

Dan Hong, Nhi Ngoc-Yen Nguyen, Huy-Hieu Pham

May 2026

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Phu X. Nguyen, Konstantinos Kontras, Wei Dai, et al.

May 2026

ECG-Chat

#Key Features

#Technical Details

#Applications

#Impact

Citations

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Recent citations

EVL-ECG: Efficient ECG Interpretation With Multi-Aspect Heterogeneous Knowledge Distillation

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

ECG-Chat

#Key Features

#Technical Details

#Applications

#Impact

Citations

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Recent citations

EVL-ECG: Efficient ECG Interpretation With Multi-Aspect Heterogeneous Knowledge Distillation

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact