Evolution-guided diffusion model that generates temporal protein folding pathways, from unfolded chain to native state, rather than static structures.
PathDiffusion is a generative diffusion model from the Yang Lab at Shandong University (Qingdao) that simulates how a protein folds — the temporal sequence of conformations connecting an unfolded chain to its native fold — rather than predicting only the final structure. This distinguishes it from structure-prediction models such as AlphaFold, which output a single static structure, and from conformational-ensemble samplers such as AlphaFlow and BioEmu, which approximate equilibrium ensembles but do not order conformations along a folding trajectory. PathDiffusion instead targets the kinetic question of the folding pathway itself.
The core idea is to inject evolutionary information into the diffusion process. The model extracts structure-aware evolutionary signal from 52 million predicted structures in the AlphaFold database and uses it to construct position-specific noise schedules (PSNS), so that different residues fold on different timescales in a manner consistent with evolutionary and structural constraints. A dual-score fusion strategy then guides the reverse diffusion to generate high-fidelity, temporally ordered folding trajectories. The framework supports both a sequence-conditional (fold-based) model and an unconditional (disorder-based) model.
Posted to bioRxiv in January 2026, PathDiffusion contributes a folding-dynamics perspective to the growing landscape of generative protein models, bridging machine learning and the long-standing biophysics question of how proteins navigate their folding landscapes.
PathDiffusion is a diffusion-based generative framework comprising a module that prepares position-specific noise schedules (PSNS) and a module that uses those schedules to drive PSNS-guided reverse diffusion. Structure-aware evolutionary features are mined from 52 million AlphaFold-database structures, with training data drawn from the Protein Data Bank and the IDRome database for disordered regions. The model was validated across multiple benchmarks: 52 proteins with experimentally characterized folding pathways (FP52), 12 fast-folding proteins compared against Anton long-timescale molecular-dynamics simulations (MD12), 50 intrinsically disordered proteins (IDP50), and three TIM-barrel proteins. Pretrained sequence-conditional and unconditional checkpoints, along with the benchmark datasets, are distributed through the project website, with implementation code on GitHub under an MIT license.
PathDiffusion is useful for biophysicists and structural biologists studying folding mechanisms, misfolding, and the conformational behavior of disordered or partially structured proteins. By generating folding trajectories in silico, it offers a fast alternative to expensive long-timescale molecular-dynamics simulations for hypotheses about folding order, intermediates, and kinetics, and it can model intrinsically disordered proteins that lack a single native fold. Such pathway-level predictions can inform studies of folding diseases, the design of foldable sequences, and the interpretation of experimental folding assays.
PathDiffusion broadens generative protein modeling from static structures and equilibrium ensembles toward explicit folding kinetics, a capability that complements rather than replaces AlphaFold-style prediction and ensemble samplers like AlphaFlow and BioEmu. Its validation against experimental pathways, Anton molecular-dynamics references, and disordered-protein benchmarks gives the approach credibility beyond a single curated test set. The availability of pretrained weights, benchmark datasets, and MIT-licensed code lowers the barrier for adoption, though the long-term durability of weights hosted on an institutional project page — rather than a versioned model hub — is a practical archival consideration.