SelfPAB

Transformer encoder self-supervised on 100,000 hours of dual-accelerometer data, used as a frozen feature extractor for human activity recognition.

Released: March 2024

SelfPAB (Self-supervised Pre-training for Accelerometer-Based human activity recognition) addresses a persistent bottleneck in wearable-sensor research: labeled accelerometer data is scarce and expensive to annotate, while unlabeled recordings from population-scale cohorts are abundant. Human activity recognition (HAR) models trained only on small labeled datasets tend to overfit and generalize poorly across sensor placements, populations, and protocols. SelfPAB tackles this by borrowing the pre-train-then-fine-tune recipe that transformed natural language and speech processing, applying it to raw motion signals from body-worn accelerometers.

Developed by Aleksej Logacjov and colleagues at the Norwegian University of Science and Technology (NTNU) and published in Applied Intelligence in 2024, SelfPAB is a transformer encoder pre-trained on roughly 100,000 hours of unlabeled dual-accelerometer recordings drawn from the HUNT4 population health study in Norway. The pre-training objective is masked spectrogram reconstruction: signals are converted to time-frequency spectrograms via a short-time Fourier transform (STFT), portions are masked, and the model learns to reconstruct them, yielding general-purpose motion representations without any activity labels.

Once pre-trained, the encoder is frozen and used as a feature extractor that feeds a lightweight supervised classifier on downstream HAR datasets. This places SelfPAB among the first demonstrations that large-scale self-supervised pre-training—and the data-scaling behavior familiar from language models—transfers effectively to dual-accelerometer human activity recognition.

Key Features

Spectrogram-based masked reconstruction: Raw accelerometer channels are transformed into STFT spectrograms and the model is trained to reconstruct masked time-frequency regions, an objective adapted from self-supervised speech and audio models.
Population-scale unlabeled pre-training: Pre-training uses about 100,000 hours of unlabeled dual-accelerometer signals from the HUNT4 cohort, far exceeding the size of typical labeled HAR corpora.
Frozen feature extractor: The pre-trained transformer is reused without further weight updates, so downstream tasks only train a small classifier on top of fixed embeddings—reducing labeled-data and compute requirements.
Consistent downstream gains: SelfPAB improves macro F1 by roughly 7–14% over fully supervised baselines across five HAR benchmarks.
Demonstrated data-scaling behavior: Pre-training on increasing amounts of data (10 to 100,000 hours) yields monotonically improving downstream performance, mirroring scaling trends seen in transformer language models.

Technical Details

SelfPAB is a transformer encoder operating on STFT spectrograms of dual-accelerometer signals (sensors placed on the lower back and thigh in the HUNT4 protocol). During self-supervised pre-training, the masked reconstruction task forces the network to model temporal and spectral structure across both sensors using approximately 100,000 hours of unlabeled HUNT4 data. The resulting frozen encoder is evaluated as a feature extractor on five downstream HAR datasets—HARTH, HAR70+, PAMAP2, Opportunity, and RealWorld—where a supervised head is trained on its embeddings. Across these benchmarks SelfPAB reports macro-F1 improvements of about 7–14% relative to supervised-from-scratch baselines, with downstream accuracy increasing as the volume of pre-training data grows. Pre-trained weights (upstream_model.ckpt, distributed via Git LFS) and training code are released under the MIT license; the HUNT4 pre-training data is access-controlled and must be requested from the HUNT databank.

Applications

SelfPAB is aimed at researchers and practitioners working with body-worn accelerometers for physical-activity monitoring, epidemiology, and digital-health studies, where large labeled datasets are rare but unlabeled recordings are plentiful. By providing a reusable pre-trained encoder, it lets groups bootstrap accurate activity classifiers from modest labeled sets, supporting use cases such as quantifying movement and sedentary behavior in cohort studies, clinical activity assessment, and sleep-and-activity analysis. Because the encoder is frozen, it integrates into existing HAR pipelines as a drop-in feature extractor without demanding large-scale GPU fine-tuning.

Impact

SelfPAB demonstrated that the self-supervised pre-training paradigm—and its characteristic data-scaling benefits—extends from language and audio to dual-accelerometer human activity recognition, an area historically dominated by small supervised models. By releasing pre-trained weights and code, the NTNU team lowered the barrier to building strong HAR systems and seeded follow-up work, including cross-sensor variants (MonoSelfPAB) and long-term spectrogram models (LTA2V) from the same group, as well as broader investigations of scaling laws in wearable activity recognition. Its main limitations are that the pre-training corpus reflects a single population and a specific dual-sensor placement, and that the most valuable asset—the HUNT4 raw data—remains access-restricted, so reproduction depends on the released checkpoints.

Citation

SelfPAB: large-scale pre-training on accelerometer data for human activity recognition

Logacjov, A., et al. (2024) SelfPAB: large-scale pre-training on accelerometer data for human activity recognition. Applied intelligence (Boston).

DOI: 10.1007/s10489-024-05322-3

Recent citations

Papers that recently cited this model.

Inertia-1: An Open Exploration of Wearable Motion Foundation Models
Zongzhe Xu, Aakarsh Anand, Sarah Jiang, et al.
Jul 2026
0Influential
Low Latency Basketball Action Recognition and Game Technology Analysis Based on Spiking Neural Network Under SAGIN Environment
Yi Kao, Chao Wei
Transactions on Emerging Telecommunications Technologies · May 2026
0
Detect and Repair: Robust Self-Supervised Wearable Sensing Under Missing Modalities
Aboul Hassane Cisse, Shoya Ishimaru
Italian National Conference on Sensors · Apr 2026
0

Top citations

The most-cited papers that cite this model.

Federated Learning for IoMT-Enhanced Human Activity Recognition with Hybrid LSTM-GRU Networks
Fahad R. Albogamy
Italian National Conference on Sensors · Feb 2025
27Influential
Self-supervised Learning for Accelerometer-based Human Activity Recognition: A Survey
Aleksej Logacjov
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies · Nov 2024
26
Long-term self-supervised learning for accelerometer-based sleep-wake recognition
Aleksej Logacjov, Kerstin Bach, P. Mork
Engineering applications of artificial intelligence · Feb 2025
5
Reducing annotation burden in physical activity research using vision language models
Abram Schonfeldt, B. Maylor, Xiaofang Chen, et al.
Scientific Reports · May 2025
3
Scaling laws in wearable human activity recognition
Tom Hoddes, Alex Bijamov, Saket Joshi, et al.
arXiv.org · Feb 2025
3

Citations

Total Citations21

Influential2

References48

GitHub

Stars18

Forks4

Open Issues1

Contributors2

Last Push1y ago

LanguagePython

LicenseMIT

Fields of citing research

Computer Science94%
Medicine56%
Engineering50%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

78Open

Usability — can I run it?95

Reproducibility — can I retrain it?58

Model Openness Framework

Class III

Open Model

Resources

GitHub Repository Research Paper

Key Features

Spectrogram-based masked reconstruction: Raw accelerometer channels are transformed into STFT spectrograms and the model is trained to reconstruct masked time-frequency regions, an objective adapted from self-supervised speech and audio models.

Population-scale unlabeled pre-training: Pre-training uses about 100,000 hours of unlabeled dual-accelerometer signals from the HUNT4 cohort, far exceeding the size of typical labeled HAR corpora.

Frozen feature extractor: The pre-trained transformer is reused without further weight updates, so downstream tasks only train a small classifier on top of fixed embeddings—reducing labeled-data and compute requirements.

Consistent downstream gains: SelfPAB improves macro F1 by roughly 7–14% over fully supervised baselines across five HAR benchmarks.

Demonstrated data-scaling behavior: Pre-training on increasing amounts of data (10 to 100,000 hours) yields monotonically improving downstream performance, mirroring scaling trends seen in transformer language models.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Inertia-1: An Open Exploration of Wearable Motion Foundation Models

Zongzhe Xu, Aakarsh Anand, Sarah Jiang, et al.

Jul 2026

0Influential

Low Latency Basketball Action Recognition and Game Technology Analysis Based on Spiking Neural Network Under SAGIN Environment

Yi Kao, Chao Wei

Transactions on Emerging Telecommunications Technologies · May 2026

Detect and Repair: Robust Self-Supervised Wearable Sensing Under Missing Modalities

Aboul Hassane Cisse, Shoya Ishimaru

Italian National Conference on Sensors · Apr 2026

SelfPAB

#Key Features

#Technical Details

#Applications

#Impact

Citation

SelfPAB: large-scale pre-training on accelerometer data for human activity recognition

Recent citations

Inertia-1: An Open Exploration of Wearable Motion Foundation Models

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

SelfPAB

#Key Features

#Technical Details

#Applications

#Impact

Citation

SelfPAB: large-scale pre-training on accelerometer data for human activity recognition

Recent citations

Inertia-1: An Open Exploration of Wearable Motion Foundation Models

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact