Technical University of Munich
Deep learning model that predicts fragment-ion intensities and retention time for modified peptides, with zero-shot generalization to unseen post-translational modifications.
Post-translational modifications (PTMs) such as phosphorylation, acetylation, and methylation dramatically expand the chemical diversity of the proteome, but they pose a persistent challenge for bottom-up mass spectrometry: confidently identifying modified peptides and pinpointing the exact residue carrying a modification ("site localization") is hard, and the number of biologically relevant modifications far exceeds what any training set can exhaustively cover. Prosit-PTM, developed in the labs of Mathias Wilhelm and Bernhard Kuster at the Technical University of Munich and released as a bioRxiv preprint in November 2025, tackles this problem by predicting tandem mass spectra and chromatographic retention behavior for modified peptides — including modifications it was never explicitly trained on.
Prosit-PTM extends the widely used Prosit family of spectrum predictors. Where earlier Prosit models focused on unmodified or a handful of common modifications, Prosit-PTM is trained on an expanded ProteomeTools resource of roughly 977,000 synthetic peptides spanning 22 PTM–residue combinations. Its central innovation is a chemically informed encoding of modifications combined with an amino-acid-substitution-based data augmentation strategy, which together let the model generalize in a zero-shot manner to PTMs absent from the training data.
By framing PTM prediction as a generalization problem rather than a per-modification retraining task, Prosit-PTM aims to make accurate spectral libraries and rescoring features available for the long tail of modifications that matter to biologists but lack dedicated synthetic-peptide reference data.
Prosit_2025_intensity_ptm2 and Prosit_2025_irt_ptm2) and usable within
open-source proteomics tools without retraining.Prosit-PTM follows the encoder–decoder deep-learning design of the Prosit lineage, predicting fragment-ion intensity spectra and indexed retention time from peptide sequence, precursor charge, and collision energy. The key methodological contributions are the chemically informed representation of modified residues and an amino-acid-substitution-based augmentation scheme that teaches the model to interpolate to unseen chemistries. Training data come from an expanded ProteomeTools collection of approximately 977,000 synthetic peptides covering 22 PTM–residue combinations. The model framework is distributed through the open-source DLOmix Python library (MIT-licensed, with both TensorFlow/Keras and PyTorch backends), and a fixed pretrained checkpoint is exposed for inference via the Koina API. The work reports improvements in phosphoproteomic site localization, multiply-modified histone peptide identification, and HLA peptide rescoring relative to prior approaches.
Prosit-PTM is aimed at proteomics researchers performing PTM-focused experiments: phosphoproteomics workflows that need confident site localization, histone-PTM studies involving combinatorial modifications, and immunopeptidomics pipelines that rescore HLA-presented peptides. Because predicted spectra and retention times can seed in-silico spectral libraries and supply discriminative features for search-engine rescoring (e.g., Percolator-style workflows), the model can be dropped into existing open-source proteomics tooling. Its zero-shot capability is especially valuable for laboratories studying rare or novel modifications for which no synthetic reference peptides exist.
Prosit-PTM addresses a core bottleneck in modified-peptide mass spectrometry — the impossibility of generating reference data for every modification — by demonstrating that a single model can generalize to unseen PTMs. Serving the model through Koina and the DLOmix framework lowers the barrier to adoption, letting groups apply it without GPU training or bespoke model development. As of mid-2026 the work is a preprint (CC BY-NC) and has not yet undergone peer review, and the model is currently available for inference only through the Koina API rather than as standalone downloadable weights; users should weigh these caveats, but the zero-shot strategy points toward more general, reusable spectral predictors for the modified proteome.
Gabriel, W., et al. (2025) Learning the Unseen: Data-Augmented Deep Learning for PTM Discovery with Prosit-PTM. bioRxiv.
DOI: 10.1101/2025.11.07.687302Papers that recently cited this model.
Daniela Klaproth-Andrade, Yanik Bruns, Wassim Gabriel, et al.
bioRxiv · Sep 2025
The most-cited papers that cite this model.
Daniela Klaproth-Andrade, Yanik Bruns, Wassim Gabriel, et al.
bioRxiv · Sep 2025
Share of papers citing this model.