bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Metabolomics foundation models
MetabolomicsSmall molecule

LSM-MS2

Matterworks, Inc.

A transformer foundation model pretrained on millions of MS/MS spectra that improves identification of challenging isomers and produces embeddings for direct biological interpretation.

Released: October 2025

LSM-MS2 is a transformer-based foundation model for tandem mass spectrometry (MS/MS) developed by Matterworks, Inc. (Somerville, MA) and described in an arXiv preprint posted in October 2025. It builds on the company's earlier self-supervised model, LSM1-MS2, which appeared as a ChemRxiv preprint in February 2024. The model targets one of the central bottlenecks in metabolomics and small-molecule analysis: the vast majority of fragmentation spectra collected in untargeted experiments cannot be confidently matched to a chemical structure, leaving most signals "dark."

Rather than treating spectrum-to-structure matching as a lookup against a reference library, LSM-MS2 learns a continuous "semantic chemical space" by pretraining on millions of MS/MS spectra. In this learned embedding space, spectra from chemically related molecules sit near one another, which helps the model resolve compounds that are notoriously difficult to distinguish — most notably isomers that share a molecular formula but differ in structure and produce highly similar fragmentation patterns. The same embeddings double as a general-purpose representation of a sample's chemical state, enabling biological and disease-state interpretation without a separate, task-specific model.

LSM-MS2 sits at the intersection of metabolomics and small-molecule cheminformatics, extending the pretrain-then-apply foundation-model paradigm (now common in protein and single-cell biology) to raw mass-spectral data. It is a commercial model: the authors are Matterworks employees, and the work involves patented technology, with inference offered through the company's Pyxis platform rather than as open code or weights.

#Key Features

  • Semantic spectral embeddings: Pretraining maximizes separation in spectral space so that chemically related spectra cluster together, yielding embeddings that support both identification and downstream interpretation.
  • Improved isomer resolution: The model reports roughly a 30% improvement in correctly identifying challenging isomeric compounds relative to conventional spectral-matching approaches.
  • Gains in complex matrices: In complex biological samples, the authors report a 42% increase in correct identifications, with performance maintained at low analyte concentrations.
  • Direct biological interpretation: Spectral embeddings are used directly for disease-state differentiation and clinical-outcome tasks, reducing the labeled data needed for each new question.
  • Foundation-model reuse: A single pretrained backbone serves both annotation and biological-readout tasks, mirroring the foundation-model approach used in other areas of computational biology.

#Technical Details

LSM-MS2 is described as a transformer-based foundation model pretrained self-supervised on millions of MS/MS spectra to produce a chemically meaningful embedding representation; the preprint does not disclose the precise architecture, tokenization of spectra, pretraining loss, or parameter count. Evaluation draws on a reference library of roughly 1.8 million spectra. Reported results include approximately a 30% improvement in identifying challenging isomers and a 42% increase in correct identifications in complex biological samples. Downstream biological tasks demonstrated from the embeddings include antipsychotic-overdose classification in mice, septic-shock prediction in emergency-department patients (macro F1 of 0.80), and cystic-fibrosis detection via unsupervised clustering. No public code or pretrained weights are released; inference is available only through the commercial Pyxis platform.

#Applications

LSM-MS2 is aimed at metabolomics, clinical mass spectrometry, and small-molecule analytics, where untargeted MS/MS experiments routinely generate far more spectra than can be annotated. By improving isomer discrimination and identification in complex matrices, it can increase the fraction of usable signal in metabolomic surveys, biomarker discovery, drug-metabolism studies, and toxicology. Because the same embeddings feed disease-state classification, the model is positioned for translational workflows — for example, stratifying patients or predicting clinical outcomes from a sample's spectral fingerprint with minimal task-specific labeling. Access is through Matterworks' Pyxis platform, so the primary beneficiaries are laboratories adopting that commercial pipeline.

#Impact

LSM-MS2 demonstrates that the foundation-model paradigm can be applied to raw tandem mass spectra, learning a transferable chemical embedding that improves hard identification problems and doubles as a substrate for biological interpretation. This is a meaningful direction for a field where most collected spectra remain unannotated, and the reported isomer and complex-matrix gains, if they hold up under independent evaluation, would address a long-standing pain point. Its significance is tempered by openness and maturity caveats: it is a preprint that has not been peer-reviewed; the architecture, pretraining-data volume, and parameter count are undisclosed; the work covers patented technology; and there is no public code or weights, with use gated behind the commercial Pyxis platform. Broader scientific adoption will depend on independent benchmarking and on how accessible the model becomes outside that platform.

Citations

LSM-MS2: A Foundation Model Bridging Spectral Identification and Biological Interpretation

Preprint

Asher, G., et al. (2025) LSM-MS2: A Foundation Model Bridging Spectral Identification and Biological Interpretation. arXiv.org.

DOI: 10.48550/arXiv.2510.26715

LSM1-MS2: A Self-Supervised Foundation Model for Tandem Mass Spectrometry Applications, Encompassing Extensive Chemical Property Predictions and Spectral Matching

Asher, G., et al. (2024) LSM1-MS2: A Self-Supervised Foundation Model for Tandem Mass Spectrometry Applications, Encompassing Extensive Chemical Property Predictions and Spectral Matching. American Chemical Society (ACS).

DOI: 10.26434/chemrxiv-2024-k06gb

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References46

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
4Closed
Usability — can I run it?7
Reproducibility — can I retrain it?0
not reproducible
Model Openness Framework
Unclassified
Restrictive license on core components

Resources

Research PaperOfficial Website