bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Protein

IDPForge

Chinese Academy of Sciences

Transformer-based protein language diffusion model generating all-atom intrinsically disordered protein conformational ensembles, validated against experimental NMR and SAXS data.

Released: 2026

Overview

IDPForge is a protein-language diffusion model that generates all-atom conformational ensembles for intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs), including for proteins with mixed folded and disordered segments. Posted to bioRxiv in March 2026, IDPForge is validated against experimental NMR and SAXS measurements and does not require sequence-specific training, making it broadly applicable to arbitrary IDP/IDR sequences.

While AlphaFold and related structure-prediction models produce single-conformation outputs that are misleading for inherently flexible proteins, IDPForge produces ensembles that capture the conformational heterogeneity essential to IDP biology — including phase separation, allosteric regulation, and signaling.

Key Features

  • All-atom conformational ensembles: Generates ensembles of full-atom structures rather than single conformations, capturing the structural heterogeneity intrinsic to IDPs.
  • Mixed folded-disordered handling: Works on proteins with mixed folded domains and disordered regions, not only fully disordered sequences.
  • Sequence-independent training: Does not require sequence-specific training data; operates directly on arbitrary input sequences.
  • NMR and SAXS validation: Ensemble outputs are quantitatively compared to experimental NMR chemical shifts and SAXS scattering curves with strong agreement.
  • Open preprint and code: bioRxiv preprint with code release.

Technical Details

IDPForge uses a transformer-based diffusion architecture trained on a curated corpus of IDP conformational ensembles drawn from molecular dynamics simulations and experimental ensemble PDB entries. The training objective is to denoise atomic coordinates conditioned on sequence input, with diversity in the prior ensuring multi-modal output. The bioRxiv preprint describes the architecture, training data, validation against experimental observables, and benchmarks against prior IDP-modeling tools.

Applications

IDPForge is suited for biophysics and structural-biology research groups studying intrinsically disordered proteins, particularly in contexts where ensemble-level descriptors (radius of gyration, contact maps, secondary-structure propensities) are required. Applications include studies of phase-separating proteins, signaling-tail conformational dynamics, allosteric regulation through IDR rearrangement, and integration with experimental NMR and SAXS data.

Impact

IDPForge fills an important gap in the protein-modeling toolkit by providing experimentally grounded conformational ensembles for the disordered fraction of the proteome that AlphaFold and related single-structure models cannot meaningfully represent. The combined sequence-independent training, all-atom output, and direct experimental validation make it a useful reference tool for IDP research.

Citation

IDPForge: Deep Learning of Proteins with Global and Local Regions of Disorder

DeCastro, S., et al. (2026) IDPForge: Deep Learning of Proteins with Global and Local Regions of Disorder. bioRxiv.

DOI: 10.64898/2026.03.25.714313

Metrics

Citations

Total Citations1
Influential0
References126

Tags

intrinsically disordered protein modelingconformational ensemble generationstructure predictiondiffusiontransformergenerativefoundation modelproteinintrinsically disordered proteinconformational ensemble

Resources

Research Paper