Tsinghua University / PharMolix Inc.
A multimodal LLM that aligns a specialized ECG signal encoder with BioMedGPT-LM-7B for cardiovascular disease detection and ECG question answering.
ECG-LM is a multimodal large language model for interpreting electrocardiograms (ECGs), developed by researchers at the Institute for AI Industry Research (AIR) at Tsinghua University together with Beijing Tsinghua Changgung Hospital and PharMolix Inc., and published in Health Data Science in February 2025. It is presented as the first model to align a specialized ECG signal encoder with a general-purpose biomedical LLM, bridging raw physiological waveforms and free-form clinical language in a single system.
The central problem ECG-LM addresses is that conventional deep-learning ECG models are typically trained as narrow, fixed-label classifiers, which limits their ability to handle the open-ended, conversational questions that arise in clinical practice. Large language models excel at such reasoning over text but cannot natively consume time-series ECG signals. ECG-LM connects the two by projecting features from a dedicated ECG encoder into the text feature space of an LLM, enabling the language model to "read" an ECG and answer questions about it.
A practical obstacle to this approach is the scarcity of paired text–ECG training data. The authors address this by synthesizing instruction-style training pairs from cardiovascular clinical guidelines and structured ECG report features, allowing the model to learn diagnostic associations without requiring large volumes of manually annotated multimodal data.
ECG-LM couples an improved ResNet-18 convolutional encoder, modified to handle variable input sizes and lead counts, with BioMedGPT-LM-7B, a 7-billion-parameter LLM built on LLaMA2-Chat-7B and pretrained on roughly 4.2 million biomedical articles from the S2ORC corpus. Training and evaluation draw on PTB-XL (21,799 clinical 12-lead records from over 18,000 patients) and the PTB-XL+ feature dataset, with non-English reports translated and manually validated. On PTB-XL, the zero-shot ECG-LM outperforms a SimCLR-based few-shot baseline across all three task families: diagnostic (F1 0.647 vs. 0.485), rhythm (F1 0.524 vs. 0.456), and form (F1 0.570 vs. 0.549). On the ECG-QA benchmark it reaches 0.758 accuracy on Single-Verify, 0.574 on Single-Choose, and 0.399 on Single-Query questions, for a 0.577 average across the three question types.
ECG-LM targets clinical and research settings where ECG interpretation must be combined with natural-language reasoning, such as automated triage, decision support, and interactive question answering over cardiac recordings. By handling diagnostic, rhythm, and form classification within a single conversational interface, it can assist clinicians who need explanations rather than bare labels, and supports researchers building ECG-aware assistants. Its tolerance for variable lead counts makes it adaptable to settings ranging from standard 12-lead hospital ECGs to reduced-lead acquisitions.
ECG-LM is an early demonstration that general biomedical LLMs can be extended to consume physiological signals directly, pointing toward conversational diagnostic tools for cardiology. Its guideline-driven synthetic-pairing strategy offers a template for other signal-to-language alignment problems where paired data are scarce. A key limitation for reproducibility and downstream adoption is that, as of publication, the code and weights had not been released; the authors stated they were preparing all code and data for public release, with parts of the supervised fine-tuning data contingent on hospital data-sharing agreements. No model card or data card is currently available, and independent benchmarking will depend on that pending release.
Yang, K., et al. (2024) ECG-LM: Understanding Electrocardiogram with a Large Language Model. Health Data Science.
DOI: 10.34133/hds.0221Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data