The Ohio State University / Carnegie Mellon University
A multimodal LLM fine-tuned to interpret electrocardiogram images, trained on the >1M-sample ECGInstruct dataset and evaluated on the ECGBench benchmark.
PULSE is a multimodal large language model (MLLM) built to interpret electrocardiograms presented as images rather than as digitized raw waveform signals. In clinical practice, ECGs are most often shared as printed or scanned 12-lead plots — embedded in PDFs, photographed on paper, or exported from monitors — and the underlying numeric signal is frequently unavailable. Most prior ECG-AI systems consume raw signals and target a narrow set of arrhythmias, limiting their use in resource-constrained settings. PULSE instead reasons directly over the ECG image, the same artifact a clinician sees, and answers open-ended questions about it in natural language.
Developed by researchers at The Ohio State University (Ruoqi Liu and Ping Zhang) in collaboration with Carnegie Mellon University (Xiang Yue), the model was introduced in the October 2024 preprint "Teach Multimodal LLMs to Comprehend Electrocardiographic Images." The work contributes three artifacts: ECGInstruct, a large instruction-tuning dataset of ECG images; PULSE itself, a 7B-parameter model fine-tuned on that data; and ECGBench, a standardized evaluation suite for ECG image understanding.
By framing ECG interpretation as an image-to-text task, PULSE connects the fast-moving world of general-purpose vision-language models to a clinically important biosignal modality, and demonstrates that instruction tuning on domain data substantially closes the gap between generalist MLLMs and the specialized reasoning ECG reading demands.
PULSE-7B is fine-tuned from the open multimodal backbone LLaVA-v1.6 (Vicuna-7B), which pairs a CLIP-style vision encoder with a 7B Vicuna language model through a projection layer. The model is trained via instruction tuning on ECGInstruct, whose 1M+ samples are constructed from existing ECG signal datasets by rendering signals into images and pairing them with task-oriented instructions covering abnormality detection, diagnosis, rhythm and morphology questions, and report generation. Evaluation is performed on ECGBench, which organizes four core tasks — including multiple-choice question answering and report generation — across nine datasets, with held-out and out-of-distribution sets (such as an MMMU-style ECG split) to probe generalization. Across these tasks PULSE outperforms strong proprietary and open MLLM baselines by an average of 15–30% in accuracy, with the largest gains on tasks requiring fine-grained reading of waveform morphology.
PULSE is most directly useful where ECGs exist only as images and signal data is inaccessible: low-resource clinics, telemedicine, retrospective chart review, and educational settings where students query annotated ECG plots. Its conversational interface lets clinicians or researchers ask targeted questions ("Is there evidence of atrial fibrillation?") or request a structured report from a single photo of a tracing. The released datasets and benchmark also give the research community a shared foundation for building and fairly comparing the next generation of ECG image-understanding models.
PULSE establishes ECG image interpretation as a tractable multimodal LLM task and provides the field's first large-scale instruction dataset (ECGInstruct) and standardized benchmark (ECGBench) for it, lowering the barrier to entry for follow-on work. The fully open release — Apache-2.0 weights, training and evaluation data, code, and a hosted demo — makes the results reproducible and extensible. Important caveats remain: the model is a research artifact rather than a cleared clinical device, its training images are largely rendered from existing signal datasets and may not capture all real-world scan artifacts, and outputs require expert verification before any diagnostic use.
Liu, R., et al. (2024) Teach Multimodal LLMs to Comprehend Electrocardiographic Images. arXiv.org.
DOI: 10.48550/arXiv.2410.19008Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data