A flow-matching generative model that predicts protein conformational ensembles across the full order-disorder continuum, from folded domains to intrinsically disordered regions.
Deep learning has transformed structure prediction for ordered, well-folded proteins, but a large fraction of the proteome does not adopt a single static structure. Intrinsically disordered proteins (IDPs) and disordered regions sample broad conformational ensembles that govern signaling, phase separation, and many disease mechanisms, yet they remain poorly captured by single-structure predictors such as AlphaFold. PepTron, developed by Peptone Ltd. and released as a preprint in October 2025, addresses this gap by directly generating conformational ensembles rather than individual structures.
PepTron is a sequence-to-ensemble generative model designed to represent proteins with any level of disorder content, spanning the full order-disorder continuum from rigid folded domains to fully disordered chains and the multi-domain proteins that mix both. The authors frame multi-domain proteins as "the most common target class in cutting-edge therapeutics," making accurate ensemble prediction directly relevant to drug discovery.
Alongside the model, the team introduced PeptoneBench, an evaluation framework that scores predicted ensembles against experimental observables for both structured and disordered proteins. On this benchmark, PepTron matches the specialized disordered-protein generator BioEmu on intrinsically disordered proteins while remaining competitive on ordered ones, positioning it as a single model that performs well across the continuum rather than excelling at only one extreme.
PepTron is trained in two stages on two complementary datasets. A PDB dataset of preprocessed protein chains (converted to NPZ format with multiple-sequence alignments) provides coverage of ordered structure, while the IDRome-o dataset supplies ensemble predictions for intrinsically disordered sequences derived from the IDRome database. The architecture combines an encoder and a structure head trained with flow matching, using self-conditioning and noise injection during training. Two checkpoints ship: PepTron-base, pre-trained on the PDB, and PepTron, obtained by fine-tuning the base model on disordered regions for best performance across the whole proteome. Inference is run from these fixed weights to produce conformational ensembles. Evaluation on PeptoneBench measures agreement with experimental observables from the BMRB (chemical shifts), SASBDB (SAXS profiles), and an integrative multi-modal set, reporting RMSE before and after reweighting; PepTron matches BioEmu on disordered targets while staying competitive on ordered ones.
PepTron is aimed at researchers studying proteins whose function depends on conformational heterogeneity rather than a single fold, including IDPs, flexible linkers, and multi-domain therapeutic targets. Generating realistic ensembles supports drug discovery against disordered targets, interpretation of NMR and SAXS experiments, and hypothesis generation about how flexibility shapes binding, regulation, and phase behavior. Because it spans the order-disorder continuum, it can be applied uniformly across diverse proteins without switching tools for ordered versus disordered cases.
PepTron contributes to a growing class of ensemble generators (such as BioEmu and AlphaFlow) that move protein prediction beyond single static structures toward the conformational distributions that drive biology. By demonstrating competitive performance across both ordered and disordered proteins from one model, and by releasing PeptoneBench as a shared evaluation standard, the work helps establish reproducible benchmarks for an area that has lacked them. As a preprint, its conclusions await peer review, and ensemble accuracy remains bounded by the experimental data and synthetic training distributions available; still, the open code, weights, and benchmark lower the barrier for the community to build on and scrutinize ensemble prediction methods.
Invernizzi, M., et al. (2025) Advancing Protein Ensemble Predictions Across the Order–Disorder Continuum. bioRxiv.
DOI: 10.1101/2025.10.18.680935Papers that recently cited this model.
A. Abyzov, Markus Zweckstetter
bioRxiv · Jun 2026
Bruno Trentini, Dejan Stancevic, Michael M. Bronstein, et al.
May 2026
Jie Huang, Yaowei Jin, Qian Shi, et al.
Current Opinion in Structural Biology · May 2026
The most-cited papers that cite this model.
Sören von Bülow, K. E. Johansson, K. Lindorff‐Larsen
bioRxiv · Dec 2025
G. Tesei, Francesco Pesce, Kresten Lindorff-Larsen
Current Opinion in Structural Biology · Sep 2025
Hamidreza Ghafouri, Silvio C. E. Tosatto, A. Monzon
Current Opinion in Structural Biology · Dec 2025
Jie Huang, Yaowei Jin, Qian Shi, et al.
Current Opinion in Structural Biology · May 2026
Nuno P. Fernandes, Tiago Gomes, Tiago N. Cordeiro
Journal of Molecular Biology · May 2026
Share of papers citing this model.