Self-supervised contrastive foundation models for wearable PPG and ECG signals, trained on ~141K Apple Watch participants from the Apple Heart and Movement Study.
Wearable devices such as the Apple Watch continuously record physiological signals — most prominently photoplethysmography (PPG) from the optical heart sensor and electrocardiograms (ECG) from on-demand recordings — but the medical labels needed to train supervised models on this data are scarce, expensive, and biased toward people who already have a diagnosis. Apple's AHMS biosignal foundation models address this gap by learning general-purpose representations of PPG and ECG signals through self-supervised contrastive learning, so that downstream health tasks can be solved from frozen embeddings rather than from large labeled datasets.
Introduced by Salar Abbaspourazad, Oussama Elachqar, Andrew C. Miller, Saba Emrani, Udhyakumar Nallasamy, and Ian Shapiro of Apple, and published at ICLR 2024, the work trains separate foundation models for PPG and for ECG on data from roughly 141,000 participants of the Apple Heart and Movement Study (AHMS), collected over approximately three years. The authors describe it as the first study to build foundation models from large-scale PPG and ECG data captured by consumer wearables, as opposed to clinical-grade equipment in controlled settings.
The central finding is that representations learned purely from unlabeled wearable biosignals already encode meaningful information about participant demographics and health conditions, which can be read out with simple probes on the frozen features. This positions wearable biosignals alongside protein sequences, genomes, and pathology images as a modality where the pretrain-then-transfer paradigm of foundation models is effective.
Each foundation model is a convolutional encoder trained with a SimCLR-style contrastive framework adapted for biosignals. The key design choices are participant-level positive pair selection (two segments from the same participant are treated as a positive pair), a stochastic augmentation pipeline suited to periodic physiological waveforms, and a regularized contrastive loss optimized with a momentum-based scheme to support stable training at scale. The PPG and ECG models are trained independently on their respective signal streams from the ~141K-participant AHMS cohort. Evaluation uses linear or lightweight probes on the frozen embeddings: the learned representations recover participant demographics (age, BMI, sex) and signal-derived attributes, and carry predictive information about health conditions, supporting the claim that self-supervision alone captures clinically relevant structure. Exact parameter counts and per-task metrics are reported in the paper rather than summarized here.
The models are aimed at health and wellness inference from wearable biosignals: estimating demographic and physiological attributes, screening for or stratifying health conditions, and serving as a feature backbone for downstream clinical and research tasks where labeled wearable data is limited. Because transfer works from frozen embeddings, researchers can build task-specific classifiers or regressors with modest labeled datasets, making the approach attractive for digital health studies, remote monitoring, and population-scale cardiovascular research built on PPG and ECG.
This work helped establish that the foundation-model recipe — large-scale self-supervised pretraining followed by lightweight transfer — extends to consumer wearable biosignals, and it has become a widely cited reference point for subsequent PPG and ECG representation-learning efforts. Its main limitation for the open research community is access: the models were trained on proprietary Apple Heart and Movement Study data, and neither the trained weights nor the training code have been released, so the results cannot be directly reproduced or the encoders reused outside Apple. The contribution is therefore primarily conceptual and methodological — a demonstration of feasibility and a blueprint — rather than a shared artifact that others can build on directly.
Abbaspourazad, S., et al. (2023) Large-scale Training of Foundation Models for Wearable Biosignals. International Conference on Learning Representations.
DOI: 10.48550/arXiv.2312.05409Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data