Apple / University of Illinois Urbana-Champaign / MIT
A self-supervised motion foundation model for wearable accelerometry, trained with relative contrastive learning on 1B segments from 87,376 participants.
RelCon is a self-supervised foundation model for human motion captured by wearable accelerometers, such as the inertial sensors in smartwatches and fitness trackers. It addresses a long-standing problem in wearable sensing: large unlabeled accelerometry datasets are abundant, but task-specific labels (activity classes, clinical gait metrics) are scarce and expensive, so models trained for one task rarely transfer to another. RelCon learns a single, general-purpose representation of raw accelerometry that can be reused across many downstream tasks with only lightweight probing.
The model's core innovation is relative contrastive learning. Standard contrastive methods such as SimCLR treat a sample's augmentations as the only positives and all other samples as negatives, ignoring the fact that some "negative" segments are in fact more similar to the anchor than others. RelCon instead trains a learnable distance measure that scores semantic similarity between any pair of time-series, then uses that continuous distance to define soft, relative relationships among candidates across both time and subjects. This lets the model preserve hierarchical structure in motion data rather than collapsing it into a binary positive/negative split.
RelCon was developed by researchers at Apple in collaboration with the University of Illinois Urbana-Champaign and MIT, and was published at ICLR 2025 (preprint released November 2024). The authors report it as the first demonstration that a wearable-motion foundation model generalizes across distinct evaluation tasks, including activity recognition and clinical gait regression.
RelCon trains in two stages. First, a learnable distance measure is fit, defining the distance between an anchor and a candidate as the accuracy of reconstructing the anchor from the candidate through a sparsemax cross-attention network with dilated-convolution branches; augmented candidates prevent trivial exact-matching solutions. Second, the foundation model—a 1D ResNet-34 operating on 2.56-second windows of raw 100 Hz 3-axis accelerometry—is trained with the frozen distance function supplying relative similarity targets across segments and subjects. Pretraining used roughly 1 billion segments from 87,376 participants drawn from Apple's wearable data. On downstream benchmarks, RelCon reached 55.28 F1 on a 16-class field activity task (vs 50.06 for SimCLR), 85.4 F1 on PAMAP2 and 69.1 F1 on Opportunity wrist-to-wrist transfer (vs 72.5 and 57.0 for a 10M-parameter UK Biobank-pretrained baseline from Yuan et al. 2024), and led on all five gait metrics, e.g. 0.756 correlation for double-support time.
RelCon targets health and activity monitoring from consumer and clinical wearables. Its embeddings power human activity recognition (classifying workouts and daily activities) and the regression of clinical gait parameters such as stride velocity and double-support time, which are relevant to mobility assessment and neurological or orthopedic monitoring. Because a single pretrained encoder transfers across tasks and across sensor-placement domains, teams building digital-health features or running observational studies can fine-tune or linearly probe it with limited labeled data instead of training task-specific models from scratch.
RelCon provides the first published evidence that a self-supervised motion foundation model from wearables generalizes across distinct task families, establishing relative contrastive learning as a strong alternative to binary contrastive methods for time-series. Its reported gains over a larger UK Biobank-pretrained baseline suggest that objective design can matter more than raw scale for wearable representations. A key limitation for the community is reproducibility: because the pretraining data and model weights are proprietary Apple data and are not released, the public MIT-licensed repository ships only the re-training pipeline (plus a synthetic demo dataset), so external users must pretrain on their own data to obtain a usable model rather than downloading published weights.
Xu, M. A., et al. (2024) RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data. International Conference on Learning Representations.
DOI: 10.48550/arXiv.2411.18822Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data