University of Oxford / City University of Hong Kong / Imperial College London / Uppsala University / GSK / Universidade Federal de Minas Gerais
A multimodal transformer foundation model for ECG, PPG, and clinical text, pretrained on heterogeneous cardiac data from ~1.7 million individuals.
Cardiac monitoring spans a fragmented landscape of signals and devices: 12-lead electrocardiograms (ECGs) in hospitals, single-lead traces from wearables, photoplethysmograms (PPG) from pulse oximeters and smartwatches, and free-text clinical reports. Most deep-learning models are trained for one modality and one task, which limits their robustness when the sensor, lead configuration, or deployment scenario changes. The Cardiac Sensing Foundation Model (CSFM) addresses this by learning unified representations across heterogeneous cardiac data, so a single pretrained backbone can be adapted to many downstream tasks.
CSFM is a transformer-based foundation model trained with generative masked pretraining on data from approximately 1.7 million individuals, integrating physiological signals (ECG and PPG) together with clinical and machine-generated text. It was developed by an Oxford-led international collaboration including the University of Oxford, City University of Hong Kong, Imperial College London, Uppsala University, GSK, and the Universidade Federal de Minas Gerais. The work was released as a preprint in mid-2025 and published as the February 2026 cover article of Nature Machine Intelligence.
By treating ECG, PPG, and text as maskable channels within a shared model, CSFM is designed to remain effective whether the input is a full 12-lead recording, a single-lead wearable trace, PPG alone, or a combination of modalities — bridging clinical and consumer-grade cardiac sensing under one framework.
CSFM uses a transformer architecture adapted from natural language processing, pretrained with a generative masked-modeling objective that hides channel-wise and temporal-wise information so the model learns to reconstruct heterogeneous cardiac signals and associated text. Pretraining aggregates three large datasets — MIMIC-III-WDB and MIMIC-IV-ECG (USA) and the CODE cohort (Brazil) — totaling roughly 1.7 million individuals. The authors provide Tiny, Base, and Large configurations; exact parameter counts are not disclosed in the paper. On downstream benchmarks, CSFM reaches an AUC of about 0.844 for one-year mortality prediction on CODE-15 and reports competitive macro-F1 on PTB-XL cardiovascular disease classification, consistently outperforming conventional one-modality-one-task baselines across diagnostic, demographic, vital-sign, and question-answering evaluations.
CSFM targets cardiac health assessment across both clinical and at-home settings. In hospitals, it can support diagnosis, mortality and outcome prediction, and vital-sign estimation from routinely collected ECG and PPG waveforms; for consumer and remote-monitoring use, its robustness to single-lead and PPG-only inputs makes it suitable for smartwatch and wearable data. Its cross-modality generation — for example reconstructing 12-lead ECGs from wearable traces — and ECG question-answering capabilities point toward decision-support and triage tools. Pretrained weights are available to academic researchers through a signed access agreement, with preprocessing and fine-tuning code released openly on GitHub.
CSFM is among the first foundation models to unify ECG, PPG, and clinical text within a single backbone that degrades gracefully across devices and lead configurations, directly addressing the deployment gap between clinical-grade and consumer cardiac sensing. Its selection as a Nature Machine Intelligence cover article and its scale — pretraining on 1.7 million individuals across three continents — signal growing momentum for general-purpose physiological foundation models. Key limitations include undisclosed parameter counts and gated weight access requiring an academic agreement rather than fully open release, which may constrain industrial reuse and independent reproduction.
Gu, X., et al. (2025) Sensing Cardiac Health Across Scenarios and Devices: A Multi-Modal Foundation Model Pretrained on Heterogeneous Data from 1.7 Million Individuals. arXiv.org.
DOI: 10.48550/arXiv.2507.01045Gu, X., et al. (2026) Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals. Nature Machine Intelligence.
DOI: 10.1038/s42256-026-01180-5Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data