A self-supervised metabolomic foundation model pretrained on NMR metabolite profiles from 430,000+ UK Biobank participants, applied without backbone retraining to aging, subtyping, and risk tasks.
MetFoundation is a metabolomic foundation model that learns general-purpose representations of human metabolic state from large-scale nuclear magnetic resonance (NMR) profiling. It was developed by researchers at Hong Kong Baptist University (corresponding author Lu Zhang, Department of Computer Science) and released as a bioRxiv preprint in May 2026. Unlike task-specific metabolomic models trained from scratch for a single endpoint, MetFoundation is pretrained once in a self-supervised manner and then reused — with its backbone frozen — across multiple downstream applications.
The model addresses a recurring bottleneck in metabolomics: each clinical or epidemiological question (aging, disease risk, patient stratification) has historically required a bespoke supervised model and large labeled cohorts. MetFoundation instead encodes the structure of circulating metabolite concentrations into a shared embedding space, so that downstream tasks need only a lightweight head rather than a full retraining of the representation. This mirrors the foundation-model paradigm now common in protein and single-cell biology, extended here to NMR-based blood metabolomics.
As the first metabolomics foundation model cataloged on bio.rodeo, MetFoundation illustrates how the pretrain-then-adapt approach transfers to a modality where features are continuous biochemical concentrations rather than sequences. The work is a preprint and has not yet been peer-reviewed.
MetFoundation is pretrained with a self-supervised objective on NMR metabolite concentration profiles from over 430,000 UK Biobank participants. The authors describe a transformer-based backbone, but the specific architectural variant and the total parameter count are not stated in the preprint. After pretraining, the backbone is held fixed and adapted three ways: (1) a fine-tuned survival head produces a mortality-informed aging clock; (2) frozen embeddings are partitioned into 13 metabolic subtypes; and (3) a lightweight student model is produced by contrastive distillation so that predictions can be made from routine blood-test measurements rather than full NMR panels. External validation is performed in the CHARLS cohort to assess transfer beyond the UK Biobank training distribution.
MetFoundation targets epidemiology, preventive medicine, and aging research, where large NMR metabolomics cohorts are increasingly available. The mortality-informed aging clock provides a candidate biomarker of biological age for population studies, while the 13-subtype stratification offers a data-driven way to group individuals by metabolic state for cohort analysis or risk enrichment. The distilled blood-test variant is the most directly translational component: by running on routine clinical chemistry rather than specialized NMR, it could let clinicians and researchers apply the model in settings without NMR infrastructure, broadening access in resource-limited or routine-care contexts.
MetFoundation extends the foundation-model paradigm to NMR blood metabolomics, demonstrating that a single self-supervised backbone can serve aging clocks, subtyping, and risk distillation without per-task retraining — and that its representations transfer to an independent cohort (CHARLS). Its significance is tempered by openness and maturity caveats that researchers should weigh: no public code repository, HuggingFace model card, or pretrained-weights URL has been identified; the work is released under a non-commercial CC BY-NC license; the architecture and parameter count are unstated; and it remains a preprint that has not undergone peer review. Adoption will depend on whether weights, code, and a peer-reviewed evaluation become available.
Xu, Y., et al. (2026) Decoding heterogeneous aging clocks and disease risk stratification using a metabolomic foundation model. bioRxiv.
DOI: 10.64898/2026.05.18.725977