A transformer encoder self-supervised on 100,000 hours of dual-accelerometer data, used as a frozen feature extractor for human activity recognition.
SelfPAB (Self-supervised Pre-training for Accelerometer-Based human activity recognition) addresses a persistent bottleneck in wearable-sensor research: labeled accelerometer data is scarce and expensive to annotate, while unlabeled recordings from population-scale cohorts are abundant. Human activity recognition (HAR) models trained only on small labeled datasets tend to overfit and generalize poorly across sensor placements, populations, and protocols. SelfPAB tackles this by borrowing the pre-train-then-fine-tune recipe that transformed natural language and speech processing, applying it to raw motion signals from body-worn accelerometers.
Developed by Aleksej Logacjov and colleagues at the Norwegian University of Science and Technology (NTNU) and published in Applied Intelligence in 2024, SelfPAB is a transformer encoder pre-trained on roughly 100,000 hours of unlabeled dual-accelerometer recordings drawn from the HUNT4 population health study in Norway. The pre-training objective is masked spectrogram reconstruction: signals are converted to time-frequency spectrograms via a short-time Fourier transform (STFT), portions are masked, and the model learns to reconstruct them, yielding general-purpose motion representations without any activity labels.
Once pre-trained, the encoder is frozen and used as a feature extractor that feeds a lightweight supervised classifier on downstream HAR datasets. This places SelfPAB among the first demonstrations that large-scale self-supervised pre-training—and the data-scaling behavior familiar from language models—transfers effectively to dual-accelerometer human activity recognition.
SelfPAB is a transformer encoder operating on STFT spectrograms of dual-accelerometer signals (sensors placed on the lower back and thigh in the HUNT4 protocol). During self-supervised pre-training, the masked reconstruction task forces the network to model temporal and spectral structure across both sensors using approximately 100,000 hours of unlabeled HUNT4 data. The resulting frozen encoder is evaluated as a feature extractor on five downstream HAR datasets—HARTH, HAR70+, PAMAP2, Opportunity, and RealWorld—where a supervised head is trained on its embeddings. Across these benchmarks SelfPAB reports macro-F1 improvements of about 7–14% relative to supervised-from-scratch baselines, with downstream accuracy increasing as the volume of pre-training data grows. Pre-trained weights (upstream_model.ckpt, distributed via Git LFS) and training code are released under the MIT license; the HUNT4 pre-training data is access-controlled and must be requested from the HUNT databank.
SelfPAB is aimed at researchers and practitioners working with body-worn accelerometers for physical-activity monitoring, epidemiology, and digital-health studies, where large labeled datasets are rare but unlabeled recordings are plentiful. By providing a reusable pre-trained encoder, it lets groups bootstrap accurate activity classifiers from modest labeled sets, supporting use cases such as quantifying movement and sedentary behavior in cohort studies, clinical activity assessment, and sleep-and-activity analysis. Because the encoder is frozen, it integrates into existing HAR pipelines as a drop-in feature extractor without demanding large-scale GPU fine-tuning.
SelfPAB demonstrated that the self-supervised pre-training paradigm—and its characteristic data-scaling benefits—extends from language and audio to dual-accelerometer human activity recognition, an area historically dominated by small supervised models. By releasing pre-trained weights and code, the NTNU team lowered the barrier to building strong HAR systems and seeded follow-up work, including cross-sensor variants (MonoSelfPAB) and long-term spectrogram models (LTA2V) from the same group, as well as broader investigations of scaling laws in wearable activity recognition. Its main limitations are that the pre-training corpus reflects a single population and a specific dual-sensor placement, and that the most valuable asset—the HUNT4 raw data—remains access-restricted, so reproduction depends on the released checkpoints.
Logacjov, A., et al. (2024) SelfPAB: large-scale pre-training on accelerometer data for human activity recognition. Applied intelligence (Boston).
DOI: 10.1007/s10489-024-05322-3Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data