PathDiffusion

Evolution-guided diffusion model that generates temporal protein folding pathways, from unfolded chain to native state, rather than static structures.

Released: January 2026

PathDiffusion is a generative diffusion model from the Yang Lab at Shandong University (Qingdao) that simulates how a protein folds — the temporal sequence of conformations connecting an unfolded chain to its native fold — rather than predicting only the final structure. This distinguishes it from structure-prediction models such as AlphaFold, which output a single static structure, and from conformational-ensemble samplers such as AlphaFlow and BioEmu, which approximate equilibrium ensembles but do not order conformations along a folding trajectory. PathDiffusion instead targets the kinetic question of the folding pathway itself.

The core idea is to inject evolutionary information into the diffusion process. The model extracts structure-aware evolutionary signal from 52 million predicted structures in the AlphaFold database and uses it to construct position-specific noise schedules (PSNS), so that different residues fold on different timescales in a manner consistent with evolutionary and structural constraints. A dual-score fusion strategy then guides the reverse diffusion to generate high-fidelity, temporally ordered folding trajectories. The framework supports both a sequence-conditional (fold-based) model and an unconditional (disorder-based) model.

Posted to bioRxiv in January 2026, PathDiffusion contributes a folding-dynamics perspective to the growing landscape of generative protein models, bridging machine learning and the long-standing biophysics question of how proteins navigate their folding landscapes.

Key Features

Temporal folding trajectories: Generates ordered sequences of conformations from unfolded to native state, explicitly modeling folding kinetics rather than only the equilibrium endpoint.
Evolution-guided diffusion: Derives position-specific noise schedules from structure-aware evolutionary information so residues fold on biologically plausible timescales.
Dual-score fusion: Combines complementary score functions during reverse diffusion to produce high-fidelity pathways.
Conditional and unconditional modes: Offers a fold-conditional model for structured proteins and an unconditional model suited to disordered systems.
Broad experimental grounding: Evaluated against proteins with experimentally characterized folding pathways and against long-timescale molecular-dynamics references.

Technical Details

PathDiffusion is a diffusion-based generative framework comprising a module that prepares position-specific noise schedules (PSNS) and a module that uses those schedules to drive PSNS-guided reverse diffusion. Structure-aware evolutionary features are mined from 52 million AlphaFold-database structures, with training data drawn from the Protein Data Bank and the IDRome database for disordered regions. The model was validated across multiple benchmarks: 52 proteins with experimentally characterized folding pathways (FP52), 12 fast-folding proteins compared against Anton long-timescale molecular-dynamics simulations (MD12), 50 intrinsically disordered proteins (IDP50), and three TIM-barrel proteins. Pretrained sequence-conditional and unconditional checkpoints, along with the benchmark datasets, are distributed through the project website, with implementation code on GitHub under an MIT license.

Applications

PathDiffusion is useful for biophysicists and structural biologists studying folding mechanisms, misfolding, and the conformational behavior of disordered or partially structured proteins. By generating folding trajectories in silico, it offers a fast alternative to expensive long-timescale molecular-dynamics simulations for hypotheses about folding order, intermediates, and kinetics, and it can model intrinsically disordered proteins that lack a single native fold. Such pathway-level predictions can inform studies of folding diseases, the design of foldable sequences, and the interpretation of experimental folding assays.

Impact

PathDiffusion broadens generative protein modeling from static structures and equilibrium ensembles toward explicit folding kinetics, a capability that complements rather than replaces AlphaFold-style prediction and ensemble samplers like AlphaFlow and BioEmu. Its validation against experimental pathways, Anton molecular-dynamics references, and disordered-protein benchmarks gives the approach credibility beyond a single curated test set. The availability of pretrained weights, benchmark datasets, and MIT-licensed code lowers the barrier for adoption, though the long-term durability of weights hosted on an institutional project page — rather than a versioned model hub — is a practical archival consideration.

Citation

PathDiffusion: modeling protein folding pathway using evolution-guided diffusion

Zhao, K., et al. (2026) PathDiffusion: modeling protein folding pathway using evolution-guided diffusion. bioRxiv.

DOI: 10.64898/2026.01.16.699856

Recent citations

Papers that recently cited this model.

World Models as Group Actions
Zijie Wang, Wei Zhang, Weiming Zhang, et al.
May 2026
1

Top citations

The most-cited papers that cite this model.

World Models as Group Actions
Zijie Wang, Wei Zhang, Weiming Zhang, et al.
May 2026
1

Citations

Total Citations1

Influential0

References0

GitHub

Stars15

Forks0

Open Issues0

Contributors1

Last Push28d ago

LanguagePython

LicenseMIT

Fields of citing research

Computer Science100%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

64Partial

Usability — can I run it?69

Reproducibility — can I retrain it?60

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website

Key Features

Temporal folding trajectories: Generates ordered sequences of conformations from unfolded to native state, explicitly modeling folding kinetics rather than only the equilibrium endpoint.

Evolution-guided diffusion: Derives position-specific noise schedules from structure-aware evolutionary information so residues fold on biologically plausible timescales.

Dual-score fusion: Combines complementary score functions during reverse diffusion to produce high-fidelity pathways.

Conditional and unconditional modes: Offers a fold-conditional model for structured proteins and an unconditional model suited to disordered systems.

Broad experimental grounding: Evaluated against proteins with experimentally characterized folding pathways and against long-timescale molecular-dynamics references.

Technical Details

Applications

Impact

PathDiffusion

Key Features

Technical Details

Applications

Impact

Citation

PathDiffusion: modeling protein folding pathway using evolution-guided diffusion

Recent citations

World Models as Group Actions

Top citations

World Models as Group Actions

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PathDiffusion

Key Features

Technical Details

Applications

Impact

Citation

PathDiffusion: modeling protein folding pathway using evolution-guided diffusion

Recent citations

World Models as Group Actions

Top citations

World Models as Group Actions

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PathDiffusion

#Key Features

#Technical Details

#Applications

#Impact

Citation

PathDiffusion: modeling protein folding pathway using evolution-guided diffusion

Recent citations

World Models as Group Actions

Top citations

World Models as Group Actions

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PathDiffusion

#Key Features

#Technical Details

#Applications

#Impact

Citation

PathDiffusion: modeling protein folding pathway using evolution-guided diffusion

Recent citations

World Models as Group Actions

Top citations

World Models as Group Actions

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact