DPLM (Dynamics-aware Protein Language Model)

Protein language model aligning ESM sequence embeddings with molecular dynamics trajectories for zero-shot mutation effect and stability prediction.

Released: May 2026

DPLM (Dynamics-aware Protein Language Model) is a protein language model from the Qing Shao lab at the University of Kentucky that injects information about protein conformational dynamics into sequence representations. Most protein language models, including the ESM family, are trained purely on static sequence data and never observe how a protein actually moves. DPLM addresses this gap by aligning ESM-derived sequence embeddings with embeddings computed from molecular-dynamics (MD) simulation trajectories, so that the resulting representations encode not just evolutionary sequence statistics but also dynamical behavior.

The alignment is learned through contrastive training: sequence embeddings from an ESM backbone are pulled toward trajectory embeddings of the same protein and pushed away from those of others. To encode the MD trajectories, the authors repurpose a pretrained video model, treating a simulation trajectory as a sequence of frames much like a video. After this contrastive alignment stage, the model is frozen, and the fixed checkpoint is applied zero-shot to downstream tasks without further fine-tuning of the backbone.

Note on naming: this DPLM is distinct from ByteDance's unrelated "Diffusion Protein Language Model," which shares the same acronym. The two models are different in both architecture and objective; here DPLM refers specifically to the dynamics-aware contrastive model described by Jiang et al. (2026). It builds on the lab's earlier S-PLM work, extending the theme of enriching protein language model representations with additional structural or physical signal.

Key Features

Dynamics-aware representations: Sequence embeddings are aligned with molecular-dynamics trajectory embeddings, so the model captures conformational and dynamical signal absent from static sequence-only language models.
Contrastive alignment: Training uses a contrastive objective to bind ESM sequence embeddings to MD trajectory embeddings for the same protein while separating mismatched pairs.
Video model for MD encoding: A pretrained video model is used to encode MD trajectories as frame sequences, an unusual cross-domain choice that treats a simulation like a video clip.
Zero-shot mutation-effect prediction: The frozen checkpoint is applied zero-shot across deep mutational scanning (DMS) datasets and outperforms ESM baselines on these benchmarks.
Lightweight downstream heads: Small task-specific heads on top of the frozen backbone support protein stability and intrinsic-disorder prediction without retraining the language model.

Technical Details

DPLM is built on an ESM transformer backbone. During alignment, ESM sequence embeddings and MD trajectory embeddings (produced by a pretrained video model) are projected into a shared space and trained with a contrastive loss; the specific upstream video model used to encode trajectories is not stated in the preprint. After contrastive training the backbone is held fixed, and evaluation is performed zero-shot for mutation-effect prediction across multiple deep mutational scanning datasets, where the dynamics-aligned embeddings improve over ESM baselines. For stability and intrinsic-disorder prediction, lightweight supervised heads are trained on top of the frozen representations. As of the preprint, no public code repository or model weights link has been located, and the final release license is not yet determined; the work is posted under a CC BY-NC-ND license on bioRxiv.

Applications

DPLM targets researchers in protein engineering, variant interpretation, and computational biophysics who need predictions that reflect protein flexibility rather than sequence alone. Its zero-shot mutation-effect scoring is directly useful for prioritizing variants in deep mutational scanning studies and for assessing the functional impact of point mutations, while the stability and intrinsic-disorder heads support protein design and the characterization of disordered regions. Because the backbone is frozen and reused, the approach is attractive for groups that want dynamics-informed embeddings without running new simulations or retraining a large language model for each task.

Impact

DPLM contributes to a growing line of work that augments protein language models with information beyond raw sequence, here specifically molecular-dynamics trajectories rather than experimental structures. By demonstrating that contrastively aligning sequence and trajectory embeddings can improve zero-shot mutation-effect prediction over ESM baselines, it offers evidence that dynamical signal is a useful and complementary learning target. As a recent preprint without released code or weights and with an unspecified upstream video encoder, its broader adoption and reproducibility remain to be established, but it points toward dynamics-aware representation learning as a promising direction for the field.

Citation

DPLM: Dynamics-aware Protein Language Model via contrastive learning between sequence and molecular dynamics simulation trajectory

Jiang, Y., et al. (2026) DPLM: Dynamics-aware Protein Language Model via contrastive learning between sequence and molecular dynamics simulation trajectory. bioRxiv.

DOI: 10.64898/2026.04.29.721692

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References45

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

10Closed

Usability — can I run it?7

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Dynamics-aware representations: Sequence embeddings are aligned with molecular-dynamics trajectory embeddings, so the model captures conformational and dynamical signal absent from static sequence-only language models.

Contrastive alignment: Training uses a contrastive objective to bind ESM sequence embeddings to MD trajectory embeddings for the same protein while separating mismatched pairs.

Video model for MD encoding: A pretrained video model is used to encode MD trajectories as frame sequences, an unusual cross-domain choice that treats a simulation like a video clip.

Zero-shot mutation-effect prediction: The frozen checkpoint is applied zero-shot across deep mutational scanning (DMS) datasets and outperforms ESM baselines on these benchmarks.

Lightweight downstream heads: Small task-specific heads on top of the frozen backbone support protein stability and intrinsic-disorder prediction without retraining the language model.

Technical Details

Applications

Impact

Citation

DPLM: Dynamics-aware Protein Language Model via contrastive learning between sequence and molecular dynamics simulation trajectory

Jiang, Y., et al. (2026) DPLM: Dynamics-aware Protein Language Model via contrastive learning between sequence and molecular dynamics simulation trajectory. bioRxiv.

DOI: 10.64898/2026.04.29.721692

DPLM (Dynamics-aware Protein Language Model)

Key Features

Technical Details

Applications

Impact

Citation

DPLM: Dynamics-aware Protein Language Model via contrastive learning between sequence and molecular dynamics simulation trajectory

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

DPLM (Dynamics-aware Protein Language Model)

Key Features

Technical Details

Applications

Impact

Citation

DPLM: Dynamics-aware Protein Language Model via contrastive learning between sequence and molecular dynamics simulation trajectory

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

DPLM (Dynamics-aware Protein Language Model)

#Key Features

#Technical Details

#Applications

#Impact

Citation

DPLM: Dynamics-aware Protein Language Model via contrastive learning between sequence and molecular dynamics simulation trajectory

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

DPLM (Dynamics-aware Protein Language Model)

#Key Features

#Technical Details

#Applications

#Impact

Citation

DPLM: Dynamics-aware Protein Language Model via contrastive learning between sequence and molecular dynamics simulation trajectory

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact