A probabilistic autoregressive model for protein molecular-dynamics trajectories that generates flexible-length paths frame-by-frame with an anti-drifting sampling strategy.
Molecular dynamics (MD) simulations reveal how proteins move, fold, and switch between functional conformations, but generating long trajectories with physics-based MD is computationally expensive. A wave of generative models—such as AlphaFlow and BioEmu—now aims to emulate equilibrium ensembles or dynamics directly, sidestepping costly integration. Most produce fixed-length outputs through joint denoising, which limits how naturally they capture the temporal, sequential character of a trajectory.
ProAR (Probabilistic Autoregressive modeling) reframes MD trajectory generation as an autoregressive sequence-modeling problem. Developed at Peking University and posted to bioRxiv in March 2026, it generates trajectories frame by frame, modeling each frame as a multivariate Gaussian distribution rather than a single deterministic structure. This probabilistic, stepwise formulation lets the model produce flexible-length trajectories while explicitly representing structural uncertainty and temporal variation.
A central challenge for autoregressive generation is error accumulation, where small per-step mistakes compound into unphysical drift over long horizons. ProAR introduces a dual-network design and an "anti-drifting" sampling strategy specifically to keep long-trajectory generation stable.
ProAR is a dual-network autoregressive system trained on the ATLAS protein molecular-dynamics dataset. At each step it predicts the next frame as a multivariate Gaussian conditioned on prior frames, and an anti-drifting sampling procedure suppresses the compounding errors that otherwise cause autoregressive trajectories to drift away from physically realistic conformations. The authors report quantitative gains over existing approaches, including a 7.5% reduction in reconstruction RMSE and a 25.8% improvement in conformation accuracy for long trajectories, indicating that the probabilistic, anti-drifting formulation translates into more faithful long-horizon dynamics rather than only short-window fidelity.
ProAR is intended for computational structural biologists studying protein conformational dynamics—exploring metastable states, transition pathways, and flexibility that a single static structure cannot capture. By emulating MD trajectories generatively, it can serve as a fast surrogate for expensive simulations when sampling ensembles, screening conformational variability across many proteins, or generating starting points for downstream analysis. Its flexible-length output is particularly relevant when the timescale of interest is not known in advance.
ProAR contributes to the growing effort to replace or accelerate physics-based MD with learned generative models, and its autoregressive, probabilistic framing is a distinctive alternative to the diffusion- and flow-based ensemble generators that dominate the area. The explicit focus on anti-drifting stability addresses a well-known weakness of sequential trajectory models. As a recent preprint without a confirmed public code or weights release, its reported improvements await independent reproduction, and—being trained on the ATLAS dataset—generalization to proteins and dynamical regimes outside that distribution remains to be demonstrated.