bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

PepTron

Peptone Ltd.

A flow-matching generative model that predicts protein conformational ensembles across the full order-disorder continuum, from folded domains to intrinsically disordered regions.

Released: October 2025

Deep learning has transformed structure prediction for ordered, well-folded proteins, but a large fraction of the proteome does not adopt a single static structure. Intrinsically disordered proteins (IDPs) and disordered regions sample broad conformational ensembles that govern signaling, phase separation, and many disease mechanisms, yet they remain poorly captured by single-structure predictors such as AlphaFold. PepTron, developed by Peptone Ltd. and released as a preprint in October 2025, addresses this gap by directly generating conformational ensembles rather than individual structures.

PepTron is a sequence-to-ensemble generative model designed to represent proteins with any level of disorder content, spanning the full order-disorder continuum from rigid folded domains to fully disordered chains and the multi-domain proteins that mix both. The authors frame multi-domain proteins as "the most common target class in cutting-edge therapeutics," making accurate ensemble prediction directly relevant to drug discovery.

Alongside the model, the team introduced PeptoneBench, an evaluation framework that scores predicted ensembles against experimental observables for both structured and disordered proteins. On this benchmark, PepTron matches the specialized disordered-protein generator BioEmu on intrinsically disordered proteins while remaining competitive on ordered ones, positioning it as a single model that performs well across the continuum rather than excelling at only one extreme.

#Key Features

  • Full order-disorder coverage: A single model generates ensembles for folded domains, disordered regions, and mixed multi-domain proteins, avoiding the need to choose between an order-specialized or disorder-specialized tool.
  • Flow-matching generation: PepTron uses a flow-matching generative process with diffusion-based denoising to sample physically plausible conformations conditioned on sequence.
  • Synthetic data augmentation: Disordered-region performance is boosted by fine-tuning on large-scale synthetic ensemble predictions, transferring knowledge from simulation-derived data into the generative model.
  • PeptoneBench evaluation: A companion benchmark compares ensembles against experimental data (chemical shifts, SAXS, and integrative measurements) with and without reweighting refinement.
  • Open release: Code (Apache-2.0), two pretrained checkpoints, an interactive HuggingFace Space, and the benchmark are all publicly available.

#Technical Details

PepTron is trained in two stages on two complementary datasets. A PDB dataset of preprocessed protein chains (converted to NPZ format with multiple-sequence alignments) provides coverage of ordered structure, while the IDRome-o dataset supplies ensemble predictions for intrinsically disordered sequences derived from the IDRome database. The architecture combines an encoder and a structure head trained with flow matching, using self-conditioning and noise injection during training. Two checkpoints ship: PepTron-base, pre-trained on the PDB, and PepTron, obtained by fine-tuning the base model on disordered regions for best performance across the whole proteome. Inference is run from these fixed weights to produce conformational ensembles. Evaluation on PeptoneBench measures agreement with experimental observables from the BMRB (chemical shifts), SASBDB (SAXS profiles), and an integrative multi-modal set, reporting RMSE before and after reweighting; PepTron matches BioEmu on disordered targets while staying competitive on ordered ones.

#Applications

PepTron is aimed at researchers studying proteins whose function depends on conformational heterogeneity rather than a single fold, including IDPs, flexible linkers, and multi-domain therapeutic targets. Generating realistic ensembles supports drug discovery against disordered targets, interpretation of NMR and SAXS experiments, and hypothesis generation about how flexibility shapes binding, regulation, and phase behavior. Because it spans the order-disorder continuum, it can be applied uniformly across diverse proteins without switching tools for ordered versus disordered cases.

#Impact

PepTron contributes to a growing class of ensemble generators (such as BioEmu and AlphaFlow) that move protein prediction beyond single static structures toward the conformational distributions that drive biology. By demonstrating competitive performance across both ordered and disordered proteins from one model, and by releasing PeptoneBench as a shared evaluation standard, the work helps establish reproducible benchmarks for an area that has lacked them. As a preprint, its conclusions await peer review, and ensemble accuracy remains bounded by the experimental data and synthetic training distributions available; still, the open code, weights, and benchmark lower the barrier for the community to build on and scrutinize ensemble prediction methods.

Citation

Advancing Protein Ensemble Predictions Across the Order–Disorder Continuum

Preprint

Invernizzi, M., et al. (2025) Advancing Protein Ensemble Predictions Across the Order–Disorder Continuum. bioRxiv.

DOI: 10.1101/2025.10.18.680935

Recent citations

Papers that recently cited this model.

  • Decoding conformational heterogeneity across disordered proteomes

    A. Abyzov, Markus Zweckstetter

    bioRxiv · Jun 2026

    0Influential
  • Entropy Across the Bridge: Conditional-Marginal Discretization for Flow and Schr\"odinger Samplers

    Bruno Trentini, Dejan Stancevic, Michael M. Bronstein, et al.

    May 2026

    0
  • From static structures to dynamic landscapes: Generative artificial intelligence for protein conformational dynamics.

    Jie Huang, Yaowei Jin, Qian Shi, et al.

    Current Opinion in Structural Biology · May 2026

    1

Top citations

The most-cited papers that cite this model.

  • AF-CALVADOS: AlphaFold-guided simulations of multi-domain proteins at the proteome level

    Sören von Bülow, K. E. Johansson, K. Lindorff‐Larsen

    bioRxiv · Dec 2025

    9Influential
  • Computational design of intrinsically disordered proteins.

    G. Tesei, Francesco Pesce, Kresten Lindorff-Larsen

    Current Opinion in Structural Biology · Sep 2025

    6
  • Advances in the determination of disordered protein ensemble.

    Hamidreza Ghafouri, Silvio C. E. Tosatto, A. Monzon

    Current Opinion in Structural Biology · Dec 2025

    2
  • From static structures to dynamic landscapes: Generative artificial intelligence for protein conformational dynamics.

    Jie Huang, Yaowei Jin, Qian Shi, et al.

    Current Opinion in Structural Biology · May 2026

    1
  • Ensemblify: A User-friendly Platform for Generating and Analyzing Conformational Ensembles of Intrinsically Disordered Proteins and Regions.

    Nuno P. Fernandes, Tiago Gomes, Tiago N. Cordeiro

    Journal of Molecular Biology · May 2026

    1

Citations

Total Citations9
Influential1
References116

GitHub

Stars13
Forks1
Open Issues0
Contributors4
Last Push4mo ago
LanguagePython
LicenseApache-2.0

Fields of citing research

  • Biology89%
  • Computer Science89%
  • Medicine56%
  • Chemistry22%
  • Mathematics11%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible
91Open
Usability — can I run it?99
Reproducibility — can I retrain it?92
Model Openness Framework
Class II
Open Tooling

Tags

conformational_ensemble_generationdiffusionflow_matchinggenerativeintrinsically_disordered_proteinsstructure_predictiontransfer_learning

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace ModelDataset