bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

ProAR

Peking University

A probabilistic autoregressive model for protein molecular-dynamics trajectories that generates flexible-length paths frame-by-frame with an anti-drifting sampling strategy.

Released: March 2026

Molecular dynamics (MD) simulations reveal how proteins move, fold, and switch between functional conformations, but generating long trajectories with physics-based MD is computationally expensive. A wave of generative models—such as AlphaFlow and BioEmu—now aims to emulate equilibrium ensembles or dynamics directly, sidestepping costly integration. Most produce fixed-length outputs through joint denoising, which limits how naturally they capture the temporal, sequential character of a trajectory.

ProAR (Probabilistic Autoregressive modeling) reframes MD trajectory generation as an autoregressive sequence-modeling problem. Developed at Peking University and posted to bioRxiv in March 2026, it generates trajectories frame by frame, modeling each frame as a multivariate Gaussian distribution rather than a single deterministic structure. This probabilistic, stepwise formulation lets the model produce flexible-length trajectories while explicitly representing structural uncertainty and temporal variation.

A central challenge for autoregressive generation is error accumulation, where small per-step mistakes compound into unphysical drift over long horizons. ProAR introduces a dual-network design and an "anti-drifting" sampling strategy specifically to keep long-trajectory generation stable.

#Key Features

  • Autoregressive trajectory generation: Frames are generated sequentially rather than denoised jointly, allowing flexible-length trajectories that respect the temporal ordering of dynamics.
  • Probabilistic frame modeling: Each frame is modeled as a multivariate Gaussian, capturing structural uncertainty and conformational variation instead of a single point estimate.
  • Anti-drifting sampling: A dedicated sampling strategy counteracts the error accumulation that typically destabilizes long autoregressive rollouts.
  • Dual-network architecture: Two coupled networks divide the prediction task, supporting stable, accurate generation over extended trajectories.

#Technical Details

ProAR is a dual-network autoregressive system trained on the ATLAS protein molecular-dynamics dataset. At each step it predicts the next frame as a multivariate Gaussian conditioned on prior frames, and an anti-drifting sampling procedure suppresses the compounding errors that otherwise cause autoregressive trajectories to drift away from physically realistic conformations. The authors report quantitative gains over existing approaches, including a 7.5% reduction in reconstruction RMSE and a 25.8% improvement in conformation accuracy for long trajectories, indicating that the probabilistic, anti-drifting formulation translates into more faithful long-horizon dynamics rather than only short-window fidelity.

#Applications

ProAR is intended for computational structural biologists studying protein conformational dynamics—exploring metastable states, transition pathways, and flexibility that a single static structure cannot capture. By emulating MD trajectories generatively, it can serve as a fast surrogate for expensive simulations when sampling ensembles, screening conformational variability across many proteins, or generating starting points for downstream analysis. Its flexible-length output is particularly relevant when the timescale of interest is not known in advance.

#Impact

ProAR contributes to the growing effort to replace or accelerate physics-based MD with learned generative models, and its autoregressive, probabilistic framing is a distinctive alternative to the diffusion- and flow-based ensemble generators that dominate the area. The explicit focus on anti-drifting stability addresses a well-known weakness of sequential trajectory models. As a recent preprint without a confirmed public code or weights release, its reported improvements await independent reproduction, and—being trained on the ATLAS dataset—generalization to proteins and dynamical regimes outside that distribution remains to be demonstrated.

Tags

molecular_dynamicsconformational_samplingautoregressivetransformergenerativeprobabilisticprotein_dynamics