bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Metabolomics

MetFoundation

Hong Kong Baptist University

A self-supervised metabolomic foundation model pretrained on NMR metabolite profiles from 430,000+ UK Biobank participants, applied without backbone retraining to aging, subtyping, and risk tasks.

Released: May 2026

MetFoundation is a metabolomic foundation model that learns general-purpose representations of human metabolic state from large-scale nuclear magnetic resonance (NMR) profiling. It was developed by researchers at Hong Kong Baptist University (corresponding author Lu Zhang, Department of Computer Science) and released as a bioRxiv preprint in May 2026. Unlike task-specific metabolomic models trained from scratch for a single endpoint, MetFoundation is pretrained once in a self-supervised manner and then reused — with its backbone frozen — across multiple downstream applications.

The model addresses a recurring bottleneck in metabolomics: each clinical or epidemiological question (aging, disease risk, patient stratification) has historically required a bespoke supervised model and large labeled cohorts. MetFoundation instead encodes the structure of circulating metabolite concentrations into a shared embedding space, so that downstream tasks need only a lightweight head rather than a full retraining of the representation. This mirrors the foundation-model paradigm now common in protein and single-cell biology, extended here to NMR-based blood metabolomics.

As the first metabolomics foundation model cataloged on bio.rodeo, MetFoundation illustrates how the pretrain-then-adapt approach transfers to a modality where features are continuous biochemical concentrations rather than sequences. The work is a preprint and has not yet been peer-reviewed.

#Key Features

  • Self-supervised pretraining at population scale: The backbone is pretrained on NMR metabolite concentration profiles from more than 430,000 UK Biobank participants, learning metabolic structure without task labels.
  • Frozen-backbone reuse: Learned representations are applied to downstream tasks without retraining the backbone, with only task-specific heads or frozen embeddings used for each new endpoint.
  • Mortality-informed aging clock: A fine-tuned survival head turns the embeddings into a metabolic aging clock anchored to mortality risk rather than chronological age alone.
  • Metabolic subtyping: Frozen embeddings are clustered into 13 distinct metabolic subtypes, offering an unsupervised stratification of metabolic state.
  • Distilled blood-test variant: Contrastive distillation compresses the full NMR model into a lightweight model that operates on routine clinical blood-test panels, extending reach beyond NMR-equipped cohorts.
  • External validation: Findings are validated externally in CHARLS (China Health and Retirement Longitudinal Study), testing generalization beyond the UK Biobank training population.

#Technical Details

MetFoundation is pretrained with a self-supervised objective on NMR metabolite concentration profiles from over 430,000 UK Biobank participants. The authors describe a transformer-based backbone, but the specific architectural variant and the total parameter count are not stated in the preprint. After pretraining, the backbone is held fixed and adapted three ways: (1) a fine-tuned survival head produces a mortality-informed aging clock; (2) frozen embeddings are partitioned into 13 metabolic subtypes; and (3) a lightweight student model is produced by contrastive distillation so that predictions can be made from routine blood-test measurements rather than full NMR panels. External validation is performed in the CHARLS cohort to assess transfer beyond the UK Biobank training distribution.

#Applications

MetFoundation targets epidemiology, preventive medicine, and aging research, where large NMR metabolomics cohorts are increasingly available. The mortality-informed aging clock provides a candidate biomarker of biological age for population studies, while the 13-subtype stratification offers a data-driven way to group individuals by metabolic state for cohort analysis or risk enrichment. The distilled blood-test variant is the most directly translational component: by running on routine clinical chemistry rather than specialized NMR, it could let clinicians and researchers apply the model in settings without NMR infrastructure, broadening access in resource-limited or routine-care contexts.

#Impact

MetFoundation extends the foundation-model paradigm to NMR blood metabolomics, demonstrating that a single self-supervised backbone can serve aging clocks, subtyping, and risk distillation without per-task retraining — and that its representations transfer to an independent cohort (CHARLS). Its significance is tempered by openness and maturity caveats that researchers should weigh: no public code repository, HuggingFace model card, or pretrained-weights URL has been identified; the work is released under a non-commercial CC BY-NC license; the architecture and parameter count are unstated; and it remains a preprint that has not undergone peer review. Adoption will depend on whether weights, code, and a peer-reviewed evaluation become available.

Citation

Decoding heterogeneous aging clocks and disease risk stratification using a metabolomic foundation model

Xu, Y., et al. (2026) Decoding heterogeneous aging clocks and disease risk stratification using a metabolomic foundation model. bioRxiv.

DOI: 10.64898/2026.05.18.725977

Openness

Unclassified
Restrictive license on core components

Tags

aging_clockcontrastive_learningdisease_risk_predictionfoundation_modelmetabolomicsnmrpatient_subtypingself_supervisedtransformer

Resources

Research Paper