bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
BiosignalsLanguage model

GEM (Grounded ECG understanding with Multimodal LLM)

National University of Singapore / Peking University

Multimodal LLM unifying 12-lead ECG time series, ECG images, and text for grounded, clinician-aligned electrocardiogram interpretation.

Released: March 2025
Parameters: 7 Billion

GEM (Grounded ECG understanding with Multimodal LLM) is a multimodal large language model for electrocardiogram (ECG) interpretation that jointly reasons over three modalities: raw 12-lead ECG time series, rendered ECG images, and natural-language text. Most prior ECG language models consume only a single modality—typically either the waveform signal or a scanned image—and produce free-text diagnoses that are difficult to verify against the underlying signal. GEM is presented as the first MLLM to unify time series, images, and text for grounded ECG analysis, meaning its diagnostic statements are tied back to specific, measurable waveform features rather than emitted as unsupported conclusions.

The model was developed by researchers at the National University of Singapore and Peking University and introduced in a March 2025 preprint, with a camera-ready version accepted to NeurIPS 2025. Beyond the model itself, the authors contribute the ECG-Grounding dataset and a "Grounded ECG Understanding" evaluation task designed to measure whether a model's reasoning aligns with clinical practice.

GEM targets a central problem in clinical decision support: a diagnosis is only trustworthy if a clinician can trace it to the evidence. By anchoring its outputs to physiological measurements such as heart rate and PR/QRS intervals, GEM aims to make automated ECG interpretation auditable and clinician-aligned.

#Key Features

  • Tri-modal input: Jointly processes 12-lead ECG time series, ECG images, and text, allowing the model to exploit the high temporal fidelity of raw signals alongside the spatial layout of standard clinical ECG printouts.
  • Dual-encoder architecture: Separate encoders extract complementary features from the time-series and image modalities, which are then combined through a cross-modal alignment mechanism so signal- and image-derived evidence inform a single interpretation.
  • Feature grounding: Diagnoses are linked to measurable ECG parameters (e.g., intervals and rates), supporting evidence-driven reasoning rather than opaque end-to-end labels.
  • Knowledge-guided instruction data: A knowledge-guided generation pipeline produces granular grounding annotations that connect diagnoses to physiological features, enabling instruction tuning toward clinically meaningful explanations.
  • Open weights and data: GEM-7B checkpoints, the ECG-CoCa encoder, and the ECG-Grounding dataset are publicly released under an Apache-2.0 codebase.

#Technical Details

GEM is a 7-billion-parameter multimodal LLM built on the LLaVA (v1.6-vicuna-7b) framework, with PULSE-7B supported as an alternative base MLLM. A dedicated ECG-CoCa encoder handles the signal/image modalities and feeds a cross-modal alignment stage before the language backbone. Training draws on a broad collection of public ECG corpora—including MIMIC-IV, PTB-XL, Code-15%, CPSC 2018, CSN, and G12E—together with the purpose-built ECG-Grounding dataset of roughly 30,000 instruction pairs annotated with heartbeat-level physiological features (about 43,600 total rows across train and test splits). On the authors' Grounded ECG Understanding evaluation, GEM reports a 7.4% improvement on the CSN benchmark, a 22.7% improvement in explainability, and a 24.8% improvement in grounding relative to baselines. Released weights are distributed in Safetensors format at BF16 precision via HuggingFace (LANSG/GEM).

#Applications

GEM is aimed at clinical and research workflows where automated ECG reading must be both accurate and explainable. Cardiologists and emergency clinicians can use grounded interpretations to quickly review which waveform features support a given diagnosis, supporting triage and second-opinion scenarios. For ML researchers, the open ECG-Grounding dataset and the Grounded ECG Understanding task provide a reproducible benchmark for evidence-based cardiac diagnosis, and the released checkpoints offer a strong starting point for fine-tuning on institution-specific ECG corpora.

#Impact

By framing ECG interpretation as a grounded, multimodal task, GEM moves automated cardiac diagnosis toward the verifiability that clinical adoption requires. Its acceptance at NeurIPS 2025, fully open weights, encoder, and grounding dataset, and an Apache-2.0 codebase (188 GitHub stars) lower the barrier for follow-on work on explainable biosignal models. The accompanying benchmark gives the community a shared yardstick for measuring whether ECG language models reason from the signal rather than around it. As a single-institution-scale model evaluated primarily on public datasets, broad clinical generalization and prospective validation remain open questions, but GEM establishes a concrete template for evidence-grounded ECG understanding.

Citation

GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images

Preprint

Lan, X., et al. (2025) GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images. arXiv.org.

DOI: 10.48550/arXiv.2503.06073

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations37
Influential5
References68

GitHub

Stars188
Forks18
Open Issues7
Contributors1
Last Push2mo ago
LanguagePython
LicenseApache-2.0

HuggingFace

Downloads254
Likes7
Last Modified1y ago
Pipelineimage-text-to-text

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
79Open
Usability — can I run it?95
Reproducibility — can I retrain it?62
Model Openness Framework
Class II
Open Tooling

Tags

cardiologyclinical_reasoningdiagnosis_groundingdual_encoderecg_interpretationelectrocardiographyinstruction_tuningmultimodalmultimodal_llmtransformer

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDataset