Transformer-based protein language diffusion model generating all-atom intrinsically disordered protein conformational ensembles, validated against experimental NMR and SAXS data.
IDPForge is a protein-language diffusion model that generates all-atom conformational ensembles for intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs), including for proteins with mixed folded and disordered segments. Posted to bioRxiv in March 2026, IDPForge is validated against experimental NMR and SAXS measurements and does not require sequence-specific training, making it broadly applicable to arbitrary IDP/IDR sequences.
While AlphaFold and related structure-prediction models produce single-conformation outputs that are misleading for inherently flexible proteins, IDPForge produces ensembles that capture the conformational heterogeneity essential to IDP biology — including phase separation, allosteric regulation, and signaling.
IDPForge uses a transformer-based diffusion architecture trained on a curated corpus of IDP conformational ensembles drawn from molecular dynamics simulations and experimental ensemble PDB entries. The training objective is to denoise atomic coordinates conditioned on sequence input, with diversity in the prior ensuring multi-modal output. The bioRxiv preprint describes the architecture, training data, validation against experimental observables, and benchmarks against prior IDP-modeling tools.
IDPForge is suited for biophysics and structural-biology research groups studying intrinsically disordered proteins, particularly in contexts where ensemble-level descriptors (radius of gyration, contact maps, secondary-structure propensities) are required. Applications include studies of phase-separating proteins, signaling-tail conformational dynamics, allosteric regulation through IDR rearrangement, and integration with experimental NMR and SAXS data.
IDPForge fills an important gap in the protein-modeling toolkit by providing experimentally grounded conformational ensembles for the disordered fraction of the proteome that AlphaFold and related single-structure models cannot meaningfully represent. The combined sequence-independent training, all-atom output, and direct experimental validation make it a useful reference tool for IDP research.
DeCastro, S., et al. (2026) IDPForge: Deep Learning of Proteins with Global and Local Regions of Disorder. bioRxiv.
DOI: 10.64898/2026.03.25.714313