bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
BiosignalsLanguage model

ECG-Chat

China University of Geosciences / Beijing Normal University / Southern University of Science and Technology

A multimodal ECG-language model that aligns 12-lead ECG waveforms with clinical text for conversational cardiac diagnosis and automated report generation.

Released: August 2024

ECG-Chat is a multimodal large language model built to bring conversational AI to the electrocardiogram (ECG), the most common cardiac diagnostic test. While general-purpose multimodal LLMs handle natural images and text well, they struggle with the time-series structure of physiological waveforms and with the specialized vocabulary of cardiology. ECG-Chat addresses this gap by jointly modeling raw 12-lead ECG signals and clinical report text, enabling multi-turn dialogue about a patient's recording, automated report drafting, and cardiac disease classification within a single system.

The model was introduced by Yubao Zhao and colleagues at China University of Geosciences, with collaborators at Beijing Normal University, Southern University of Science and Technology, ESIGELEC, and the University of Liverpool. The work was first released as a preprint in August 2024 and subsequently accepted to the 2025 IEEE International Conference on Multimedia and Expo (ICME).

ECG-Chat combines two ideas that have driven recent progress in medical AI: contrastive signal-text pretraining (in the spirit of CLIP) to learn aligned ECG and report representations, and a LLaVA-style instruction-tuned LLM that grounds a language model in those learned signal features. The result is a system that can both retrieve relevant reports in a zero-shot setting and generate free-text diagnostic narratives.

#Key Features

  • ECG-CoCa contrastive encoder: A waveform-text encoder built on the OpenCLIP/CoCa framework aligns 12-lead ECG signals with their clinical reports, producing embeddings that support zero-shot classification and retrieval without task-specific fine-tuning.
  • Conversational diagnosis: A LLaVA-based backbone connects the ECG encoder to a Vicuna-13B language model, allowing multi-turn question answering about a recording rather than a single fixed prediction.
  • Automated report generation: The model drafts structured ECG analysis reports and can render them through an automated LaTeX pipeline.
  • Instruction-tuned on curated dialogue: Training uses an ECG-Instruct corpus of a 19K diagnosis dataset and a 25K multi-turn dialogue dataset, constructed with GPT-4o, to teach clinically grounded conversational behavior.
  • LoRA fine-tuning: Low-rank adaptation mitigates catastrophic forgetting in the language backbone while specializing it for cardiac tasks.

#Technical Details

ECG-Chat has two stages. First, ECG-CoCa is contrastively pretrained on paired ECG-report data drawn from five public datasets: MIMIC-IV-ECG, the Chapman-Shaoxing-Ningbo collection, Shandong Provincial Hospital (SPH), PTB-XL, and CPSC2018, with augmentation via the torch_ecg library. Second, the learned ECG features are fed, LLaVA-style, into a Vicuna-13B LLM that is instruction-tuned (with LoRA) on the GPT-4o-curated ECG-Instruct corpus. On zero-shot ECG-report retrieval over the PTB-XL test set (2K samples), the CoCa encoder with waveform-driven embedding reaches Recall@1 of 64.7% (ECG-to-report) and 71.6% (report-to-ECG). For report generation it reports BLEU-4 of 11.19 and ROUGE-L of 29.93, with disease and rhythm F1 of 22.33 and 43.39. Zero-shot classification F1 reaches 52.8% on PTB-XL and 80.1% on CPSC2018.

#Applications

ECG-Chat targets clinical and research workflows where ECG interpretation is a bottleneck. Cardiologists and general clinicians could use it as a drafting assistant that generates a first-pass report and answers follow-up questions about rhythm, conduction, and disease findings, while researchers can use the ECG-CoCa embeddings for zero-shot screening, cohort retrieval, and transfer learning across ECG datasets. Its conversational interface also makes it a candidate for medical education and for triage settings where a structured, explainable summary is more useful than a single label.

#Impact

ECG-Chat is part of a wave of waveform-language foundation models extending the vision-language recipe to physiological signals, and it demonstrates that contrastive signal-text pretraining plus instruction tuning can support both retrieval and generative diagnosis from raw ECGs. Its acceptance at ICME 2025 and open release of code and pretrained checkpoints lower the barrier for follow-up work. Important caveats remain: the model is a research prototype validated on retrospective public benchmarks rather than in prospective clinical trials, report-generation BLEU/ROUGE scores indicate meaningful room for improvement, and the released weights are distributed via Google Drive without a stated software license, which may limit reuse.

Citations

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Zhao, Y., et al. (2024) ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis. IEEE International Conference on Multimedia and Expo.

DOI: 10.1109/ICME59968.2025.11209476

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Preprint

Zhao, Y., et al. (2024) ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis. IEEE International Conference on Multimedia and Expo.

DOI: 10.48550/arXiv.2408.08849

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations44
Influential6
References76

GitHub

Stars80
Forks9
Open Issues6
Contributors1
Last Push6mo ago
LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
27Closed
Usability — can I run it?23
Reproducibility — can I retrain it?17
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cardiologycontrastive_learningdisease_classificationecginstruction_tuningmultimodalreport_generationtransformerzero_shot_retrieval

Resources

GitHub RepositoryResearch Paper