Apple's foundation model trained on behavioral signals from wearables, modeling 27 HealthKit metrics to improve predictions across 57 health tasks.
The Wearable Behavior Model (WBM) is a foundation model from Apple that learns general-purpose representations of human health from the behavioral signals recorded by consumer wearables, rather than from the raw, high-frequency sensor streams those devices produce. Where most wearable foundation models ingest low-level photoplethysmography or accelerometer waveforms, WBM operates on higher-level, already-summarized behavioral metrics — sleep duration, step counts, heart-rate statistics, exercise minutes, and similar quantities — that are aligned with physiologically relevant timescales and are therefore often more informative per unit of data.
Introduced in the 2025 paper "Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions" (accepted to ICML 2025), WBM was trained on data from the Apple Heart and Movement Study, one of the largest longitudinal wearable cohorts available. The work argues that behavioral data, which is cheap to store, privacy-friendlier, and naturally sparse and irregular, is a strong substrate for population-scale health modeling.
WBM fits into the emerging landscape of wearable foundation models alongside efforts such as Apple's own sensor-level models, Google's wearable LSM, and academic step/heart-rate models — but it is distinctive in deliberately modeling derived behavioral metrics and in demonstrating that this abstraction level can complement, and sometimes outperform, raw-sensor approaches.
WBM is a self-supervised foundation model built on a bi-directional Mamba-2 state space model, chosen for its linear-time handling of long behavioral time series. Its inputs are 27 behavioral metrics derived from Apple HealthKit, summarized over time rather than sampled at sensor frequencies. Pretraining used over 2.5 billion hours of data from about 162,000 individuals in the Apple Heart and Movement Study. Downstream evaluation followed a linear-probing protocol: representations from the frozen model are used to train simple linear classifiers/regressors on 57 health tasks, isolating the quality of the learned embeddings. Across these tasks, WBM's behavioral representations matched or exceeded raw-sensor foundation-model baselines, and combining behavioral and sensor embeddings yielded further gains, supporting the paper's central claim that behavioral data is a valuable and underused modality for health foundation models.
WBM targets population- and individual-scale digital health: predicting health conditions and states, screening and risk stratification, and powering downstream health features that can be fine-tuned or linearly probed from a single shared representation. Because it consumes already-summarized behavioral metrics, it is well suited to settings where raw sensor data is unavailable, too large, or too sensitive to retain — making it relevant to wearable manufacturers, digital-health researchers, and large-cohort epidemiological studies that already collect HealthKit-style behavioral summaries.
WBM provides evidence that derived behavioral metrics — not just raw sensor waveforms — are a powerful foundation for health prediction, reframing how wearable foundation models are built and validated at scale. Its breadth (57 tasks, 162,000 people, 2.5B+ hours) makes it a notable reference point for behavioral health modeling. A key limitation for the broader community is access: the model weights and training code have not been released, owing to the participant-consent restrictions governing the Apple Heart and Movement Study, so reproduction and independent benchmarking remain constrained to the results reported in the paper.
Erturk, E., et al. (2025) Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions. International Conference on Machine Learning.
DOI: 10.48550/arXiv.2507.00191Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data