bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

Prosit-PTM

Technical University of Munich

Deep learning model that predicts fragment-ion intensities and retention time for modified peptides, with zero-shot generalization to unseen post-translational modifications.

Released: November 2025

Post-translational modifications (PTMs) such as phosphorylation, acetylation, and methylation dramatically expand the chemical diversity of the proteome, but they pose a persistent challenge for bottom-up mass spectrometry: confidently identifying modified peptides and pinpointing the exact residue carrying a modification ("site localization") is hard, and the number of biologically relevant modifications far exceeds what any training set can exhaustively cover. Prosit-PTM, developed in the labs of Mathias Wilhelm and Bernhard Kuster at the Technical University of Munich and released as a bioRxiv preprint in November 2025, tackles this problem by predicting tandem mass spectra and chromatographic retention behavior for modified peptides — including modifications it was never explicitly trained on.

Prosit-PTM extends the widely used Prosit family of spectrum predictors. Where earlier Prosit models focused on unmodified or a handful of common modifications, Prosit-PTM is trained on an expanded ProteomeTools resource of roughly 977,000 synthetic peptides spanning 22 PTM–residue combinations. Its central innovation is a chemically informed encoding of modifications combined with an amino-acid-substitution-based data augmentation strategy, which together let the model generalize in a zero-shot manner to PTMs absent from the training data.

By framing PTM prediction as a generalization problem rather than a per-modification retraining task, Prosit-PTM aims to make accurate spectral libraries and rescoring features available for the long tail of modifications that matter to biologists but lack dedicated synthetic-peptide reference data.

#Key Features

  • Zero-shot PTM prediction: Chemically informed modification encoding plus substitution-based augmentation allow accurate fragment-ion and retention-time predictions for modifications not seen during training, removing the need to retrain for each new PTM.
  • Dual-property prediction: A single framework predicts both MS/MS fragment-ion intensities and indexed retention time (iRT), the two properties most useful for spectral-library generation and search rescoring.
  • Trained on synthetic ground truth: Built on an expanded ProteomeTools dataset of ~977,000 synthetic peptides covering 22 PTM–residue combinations, providing experimentally measured spectra rather than inferred labels.
  • Improved site localization: Demonstrated gains in phosphoproteomic PTM-site localization and in identifying multiply modified histone peptides, two long-standing pain points in modified-peptide analysis.
  • Open tool integration: Served from a fixed pretrained checkpoint through the Koina inference platform (model IDs Prosit_2025_intensity_ptm2 and Prosit_2025_irt_ptm2) and usable within open-source proteomics tools without retraining.

#Technical Details

Prosit-PTM follows the encoder–decoder deep-learning design of the Prosit lineage, predicting fragment-ion intensity spectra and indexed retention time from peptide sequence, precursor charge, and collision energy. The key methodological contributions are the chemically informed representation of modified residues and an amino-acid-substitution-based augmentation scheme that teaches the model to interpolate to unseen chemistries. Training data come from an expanded ProteomeTools collection of approximately 977,000 synthetic peptides covering 22 PTM–residue combinations. The model framework is distributed through the open-source DLOmix Python library (MIT-licensed, with both TensorFlow/Keras and PyTorch backends), and a fixed pretrained checkpoint is exposed for inference via the Koina API. The work reports improvements in phosphoproteomic site localization, multiply-modified histone peptide identification, and HLA peptide rescoring relative to prior approaches.

#Applications

Prosit-PTM is aimed at proteomics researchers performing PTM-focused experiments: phosphoproteomics workflows that need confident site localization, histone-PTM studies involving combinatorial modifications, and immunopeptidomics pipelines that rescore HLA-presented peptides. Because predicted spectra and retention times can seed in-silico spectral libraries and supply discriminative features for search-engine rescoring (e.g., Percolator-style workflows), the model can be dropped into existing open-source proteomics tooling. Its zero-shot capability is especially valuable for laboratories studying rare or novel modifications for which no synthetic reference peptides exist.

#Impact

Prosit-PTM addresses a core bottleneck in modified-peptide mass spectrometry — the impossibility of generating reference data for every modification — by demonstrating that a single model can generalize to unseen PTMs. Serving the model through Koina and the DLOmix framework lowers the barrier to adoption, letting groups apply it without GPU training or bespoke model development. As of mid-2026 the work is a preprint (CC BY-NC) and has not yet undergone peer review, and the model is currently available for inference only through the Koina API rather than as standalone downloadable weights; users should weigh these caveats, but the zero-shot strategy points toward more general, reusable spectral predictors for the modified proteome.

Citation

Learning the Unseen: Data-Augmented Deep Learning for PTM Discovery with Prosit-PTM

Preprint

Gabriel, W., et al. (2025) Learning the Unseen: Data-Augmented Deep Learning for PTM Discovery with Prosit-PTM. bioRxiv.

DOI: 10.1101/2025.11.07.687302

Recent citations

Papers that recently cited this model.

  • Modanovo: A Unified Model for Post-translational Modification-Aware De Novo Sequencing Using Experimental Spectra From In Vivo and Synthetic Peptides

    Daniela Klaproth-Andrade, Yanik Bruns, Wassim Gabriel, et al.

    bioRxiv · Sep 2025

    2

Top citations

The most-cited papers that cite this model.

  • Modanovo: A Unified Model for Post-translational Modification-Aware De Novo Sequencing Using Experimental Spectra From In Vivo and Synthetic Peptides

    Daniela Klaproth-Andrade, Yanik Bruns, Wassim Gabriel, et al.

    bioRxiv · Sep 2025

    2

Citations

Total Citations1
Influential0
References57

GitHub

Stars40
Forks13
Open Issues4
Contributors9
Last Push5d ago
LanguageJupyter Notebook
LicenseMIT

Fields of citing research

  • Biology100%
  • Chemistry100%
  • Computer Science100%
  • Medicine100%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
30Closed
Usability — can I run it?56
Reproducibility — can I retrain it?8
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

mass_spectrometryproteomicsptm_localizationspectral_predictiontransformerzero_shot

Resources

GitHub RepositoryResearch PaperDemo