bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
BiosignalsImaging

PULSE

The Ohio State University / Carnegie Mellon University

A multimodal LLM fine-tuned to interpret electrocardiogram images, trained on the >1M-sample ECGInstruct dataset and evaluated on the ECGBench benchmark.

Released: October 2024
Parameters: 7 Billion

PULSE is a multimodal large language model (MLLM) built to interpret electrocardiograms presented as images rather than as digitized raw waveform signals. In clinical practice, ECGs are most often shared as printed or scanned 12-lead plots — embedded in PDFs, photographed on paper, or exported from monitors — and the underlying numeric signal is frequently unavailable. Most prior ECG-AI systems consume raw signals and target a narrow set of arrhythmias, limiting their use in resource-constrained settings. PULSE instead reasons directly over the ECG image, the same artifact a clinician sees, and answers open-ended questions about it in natural language.

Developed by researchers at The Ohio State University (Ruoqi Liu and Ping Zhang) in collaboration with Carnegie Mellon University (Xiang Yue), the model was introduced in the October 2024 preprint "Teach Multimodal LLMs to Comprehend Electrocardiographic Images." The work contributes three artifacts: ECGInstruct, a large instruction-tuning dataset of ECG images; PULSE itself, a 7B-parameter model fine-tuned on that data; and ECGBench, a standardized evaluation suite for ECG image understanding.

By framing ECG interpretation as an image-to-text task, PULSE connects the fast-moving world of general-purpose vision-language models to a clinically important biosignal modality, and demonstrates that instruction tuning on domain data substantially closes the gap between generalist MLLMs and the specialized reasoning ECG reading demands.

#Key Features

  • Image-native ECG interpretation: Operates on rendered 12-lead ECG plots, the format actually circulated in clinics, rather than requiring raw signal access that is often unavailable.
  • ECGInstruct dataset: A curated instruction-tuning corpus of over 1 million ECG image-text samples spanning diverse cardiac conditions, sourced from multiple public ECG repositories.
  • ECGBench evaluation suite: A new benchmark covering four ECG interpretation tasks across nine datasets (including PTB-XL, CODE-15, CPSC, CSN, ECG-QA, and out-of-distribution sets), enabling reproducible comparison.
  • Strong gains over generalist MLLMs: Achieves an average accuracy improvement of roughly 15–30% over general-purpose multimodal LLMs on the benchmark tasks.
  • Open weights and demo: Apache-2.0 licensed weights (PULSE-7B), training/evaluation datasets, and a live HuggingFace Space demo are all publicly released.

#Technical Details

PULSE-7B is fine-tuned from the open multimodal backbone LLaVA-v1.6 (Vicuna-7B), which pairs a CLIP-style vision encoder with a 7B Vicuna language model through a projection layer. The model is trained via instruction tuning on ECGInstruct, whose 1M+ samples are constructed from existing ECG signal datasets by rendering signals into images and pairing them with task-oriented instructions covering abnormality detection, diagnosis, rhythm and morphology questions, and report generation. Evaluation is performed on ECGBench, which organizes four core tasks — including multiple-choice question answering and report generation — across nine datasets, with held-out and out-of-distribution sets (such as an MMMU-style ECG split) to probe generalization. Across these tasks PULSE outperforms strong proprietary and open MLLM baselines by an average of 15–30% in accuracy, with the largest gains on tasks requiring fine-grained reading of waveform morphology.

#Applications

PULSE is most directly useful where ECGs exist only as images and signal data is inaccessible: low-resource clinics, telemedicine, retrospective chart review, and educational settings where students query annotated ECG plots. Its conversational interface lets clinicians or researchers ask targeted questions ("Is there evidence of atrial fibrillation?") or request a structured report from a single photo of a tracing. The released datasets and benchmark also give the research community a shared foundation for building and fairly comparing the next generation of ECG image-understanding models.

#Impact

PULSE establishes ECG image interpretation as a tractable multimodal LLM task and provides the field's first large-scale instruction dataset (ECGInstruct) and standardized benchmark (ECGBench) for it, lowering the barrier to entry for follow-on work. The fully open release — Apache-2.0 weights, training and evaluation data, code, and a hosted demo — makes the results reproducible and extensible. Important caveats remain: the model is a research artifact rather than a cleared clinical device, its training images are largely rendered from existing signal datasets and may not capture all real-world scan artifacts, and outputs require expert verification before any diagnostic use.

Citation

Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Preprint

Liu, R., et al. (2024) Teach Multimodal LLMs to Comprehend Electrocardiographic Images. arXiv.org.

DOI: 10.48550/arXiv.2410.19008

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations27
Influential8
References48

GitHub

Stars64
Forks14
Open Issues9
Contributors2
Last Push1y ago
LanguagePython

HuggingFace

Downloads1K
Likes37
Last Modified1y ago
Pipelineimage-text-to-text

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
84Open
Usability — can I run it?100
Reproducibility — can I retrain it?72
Model Openness Framework
Class III
Open Model

Tags

cardiologyecgecg_interpretationinstruction_tuningmultimodalquestion_answeringreport_generationtransformervision_transformer

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace ModelDemoDatasetDataset