bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
BiosignalsLanguage model

MELP

The University of Hong Kong

Multi-scale ECG-language pretraining model that aligns 12-lead ECG signals with clinical text at token, beat, and rhythm levels for zero-shot cardiac diagnosis.

Released: June 2025
Parameters: 65.6 Million

MELP (Multi-scale ECG-Language Pretraining) is a multimodal foundation model that learns transferable representations of 12-lead electrocardiograms by aligning ECG signals with their paired free-text clinical reports. It was developed by Fuying Wang, Jiacheng Xu, and Lequan Yu at the HKU-MedAI group at The University of Hong Kong, and presented at ICML 2025. The work targets a persistent gap in ECG self-supervised learning: most prior contrastive and masked-modeling approaches treat the ECG as a single flat sequence and therefore fail to capture the signal's inherently multi-scale structure, where clinically meaningful patterns span everything from individual waveform deflections to the rhythm of an entire recording.

MELP's central idea is hierarchical cross-modal supervision. Rather than computing a single global similarity between an ECG and its report, the model aligns the two modalities at three nested scales — token, beat, and rhythm — so that fine-grained morphology and global rhythm context are each grounded in language. This mirrors how cardiologists read ECGs, reasoning simultaneously about local wave shapes (P, QRS, T) and the overall rhythm.

By learning directly from the language of clinical reports, MELP produces an ECG encoder that can perform open-vocabulary, zero-shot classification of cardiac conditions without any task-specific labels, and that transfers efficiently to labeled downstream tasks via linear probing and fine-tuning.

#Key Features

  • Hierarchical multi-scale alignment: Cross-modal supervision is applied at the token, beat, and rhythm levels, jointly capturing local waveform morphology and global rhythm rather than a single coarse ECG-report match.
  • Zero-shot ECG classification: Because the encoder is grounded in clinical text, it classifies arbitrary cardiac conditions described in natural language without retraining, outperforming prior ECG SSL methods across three public benchmarks.
  • Strong label efficiency: Linear probing and transfer-learning evaluations show that the learned representations adapt to new datasets with limited labeled data.
  • Clinically grounded text encoder: MELP pairs an ECG encoder with a cardiology-oriented language model (MedCPT-Query-Encoder) to embed report text into a shared representation space.
  • Open release: Code is released under the MIT License and pretrained encoder weights are available on HuggingFace under Apache-2.0.

#Technical Details

MELP couples a transformer-based ECG encoder (built on an ECGFM-style backbone) with a biomedical text encoder derived from MedCPT-Query-Encoder, and trains them with a combination of global CLIP-style contrastive loss, a captioning objective, and a local alignment loss that operates over the token/beat/rhythm hierarchy (reported loss weights of 1.0, 2.0, and 0.2 respectively). The released encoder comprises roughly 65.6M parameters and is distributed in BF16. Pretraining draws on large-scale paired 12-lead ECG and clinical-report data from MIMIC-IV-ECG. The authors evaluate on three public ECG datasets — including PTB-XL, CPSC 2018, and CSN/Chapman-Shaoxing — across zero-shot classification, linear probing, and transfer-learning protocols, where MELP consistently improves over existing self-supervised baselines.

#Applications

MELP is aimed at automated ECG interpretation and cardiac screening, where labeled data is scarce but reports are abundant. Its zero-shot capability lets researchers query for new diagnostic categories described in plain text without curating labeled training sets, while its label-efficient representations support building classifiers for arrhythmia detection, conduction abnormalities, and other conditions from modest annotated cohorts. The pretrained encoder serves as a reusable backbone for hospitals, biomedical ML researchers, and developers building ECG analysis tools.

#Impact

MELP advances the small but growing field of ECG-language foundation models by demonstrating that explicitly modeling the multi-scale structure of cardiac signals, rather than treating an ECG as a monolithic sequence, yields measurably better cross-modal alignment and stronger zero-shot and transfer performance. By open-sourcing both code and weights, the HKU-MedAI team lowers the barrier to building language-grounded ECG models. Its main limitations stem from its pretraining source: reliance on MIMIC-IV-ECG report style and population may limit generalization, and the public model card provides only sparse documentation of training data and evaluation specifics.

Citation

From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining

Preprint

Wang, F., et al. (2025) From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining. International Conference on Machine Learning.

DOI: 10.48550/arXiv.2506.21803

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations16
Influential2
References48

GitHub

Stars29
Forks3
Open Issues3
Contributors1
Last Push3mo ago
LanguagePython
LicenseMIT

HuggingFace

Downloads18
Likes1
Last Modified8mo ago
Pipelineaudio-text-to-text

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
68Partial
Usability — can I run it?78
Reproducibility — can I retrain it?55
Model Openness Framework
Unclassified
Missing required components

Tags

contrastive_learningecg_classificationelectrocardiogrammultimodalrepresentation_learningtransformerzero_shot_classification

Resources

GitHub RepositoryResearch PaperHuggingFace Model