SeqDance / ESMDance

Protein language models trained on biophysical dynamics from MD simulations and normal-mode analysis; ESMDance builds on ESM2 for variant effects.

Released: October 2024

Parameters: 35.2 Million

SeqDance and ESMDance are paired protein language models that learn from protein biophysical dynamics rather than only from evolutionary sequence patterns. Developed by Chao Hou, Haiqing Zhao, and Yufeng Shen at Columbia University, the work was posted to bioRxiv in October 2024 and published in PNAS as "Protein language models trained on biophysical dynamics inform mutation effects." Most protein language models—ESM2 being the canonical example—are trained to reconstruct masked residues from sequence context, which captures evolutionary signal but not the conformational motion that underlies protein function. SeqDance and ESMDance instead supervise on dynamic properties.

The training signal is derived from molecular dynamics (MD) trajectories and normal-mode analysis (NMA) of tens of thousands of proteins, yielding per-residue and pairwise descriptors of fluctuation and co-movement. The two models differ in initialization. SeqDance is trained from scratch on these dynamics, learning to represent motion using no prior evolutionary or structural information. ESMDance instead builds on the frozen ESM2-35M backbone and is trained on the same dynamics, fusing evolutionary representation with biophysical signal.

This dynamics-aware framing is complementary to ensemble samplers and structure predictors: where AlphaFold-style models predict a static fold and methods like AlphaFlow or BioEmu sample conformational ensembles, SeqDance and ESMDance embed dynamic behavior directly into sequence representations that can be queried zero-shot for downstream tasks.

Key Features

Dynamics-supervised pretraining: Both models are trained on biophysical properties extracted from MD trajectories and normal-mode analysis rather than masked-language modeling alone, capturing residue fluctuation and co-movement.
Two complementary models: SeqDance is trained from scratch on dynamics; ESMDance fine-tunes on a frozen ESM2-35M backbone, combining evolutionary and dynamic information.
Zero-shot mutation effects: ESMDance substantially outperforms ESM2 in zero-shot mutation-effect prediction for designed and viral proteins, which lack the deep evolutionary alignments that conventional PLMs rely on.
Conformational property prediction: SeqDance captures dynamic interaction patterns in unseen proteins and predicts global properties such as the radius of gyration for both intrinsically disordered regions and ordered proteins.
Open code and weights: Code is released on GitHub under GPL-3.0, with trained weights on Zenodo and HuggingFace model cards for both SeqDance and ESMDance.

Technical Details

Both models use a Transformer encoder architecture identical to ESM2-35M—12 layers, 20 attention heads per layer, an embedding dimension of 480, and roughly 35 million parameters. Training data are dynamic biophysical descriptors derived from MD trajectories and normal-mode analyses of tens of thousands of proteins (combining the MD and NMA sets spanning on the order of 64,000–65,000 proteins). SeqDance initializes parameters randomly and learns dynamics directly; ESMDance keeps the ESM2-35M weights frozen and learns to map their representations to dynamic properties. ESMDance's gains are most pronounced on designed and viral proteins, where ESM2's evolutionary signal is weak, demonstrating that the dynamics objective adds information beyond conservation.

Applications

The models are useful for variant effect prediction and for analyzing protein flexibility when evolutionary information is limited—settings common in protein design and in studying fast-evolving viral proteins. ESMDance offers a drop-in, zero-shot scorer for mutation effects, while SeqDance provides representations and predictions of conformational properties (such as radius of gyration and dynamic contacts) for both ordered and disordered proteins, supporting researchers studying protein motion and stability.

Impact

SeqDance and ESMDance show that supervising protein language models on biophysical dynamics, rather than evolution alone, yields representations that transfer to mutation- effect prediction—especially for designed and viral proteins that defeat conservation-based methods. Peer-reviewed publication in PNAS, together with openly released code and weights, makes the approach reproducible and positions dynamics-aware pretraining as a complement to both static structure predictors and conformational-ensemble samplers. The models are modest in scale (35M parameters), so a natural future direction is scaling the dynamics objective to larger backbones and broader protein families.

Citations

Protein language models trained on biophysical dynamics inform mutation effects

Hou, C., et al. (2026) Protein language models trained on biophysical dynamics inform mutation effects. Proceedings of the National Academy of Sciences.

DOI: 10.1073/pnas.2530466123

Learning Biophysical Dynamics with Protein Language Models

Preprint

Hou, C., et al. (2024) Learning Biophysical Dynamics with Protein Language Models. bioRxiv.

DOI: 10.1101/2024.10.11.617911

Recent citations

Papers that recently cited this model.

Learning physical interactions to compose biological large language models
Joseph D Clark, Tanner J. Dean, D. Shukla
Communications Chemistry · Jan 2026
0
MotifAE Reveals Functional Motifs from Protein Language Model: Unsupervised Discovery and Interpretability Analysis
Chao Hou, Di Liu, Yufeng Shen
bioRxiv · Nov 2025
1
ESMDynamic: Fast and Accurate Prediction of Protein Dynamic Contact Maps from Single Sequences
D. Kleiman, Jiangyan Feng, Z. Xue, et al.
bioRxiv · Aug 2025
2

Top citations

The most-cited papers that cite this model.

Understanding Language Model Scaling on Protein Fitness Prediction
Chao Hou, Di Liu, Aziz Zafar, et al.
bioRxiv · Jul 2025
7
Machine Learning-Driven Enzyme Mining: Opportunities, Challenges, and Future Perspectives
Felix Moorhoff, Yanzi Zhang, Sizhe Qiu, et al.
ACS Catalysis · Jul 2025
6
AF2χ: Predicting protein side-chain rotamer distributions with AlphaFold2
M. Cagiada, F. E. Thomasen, Sergey Ovchinnikov, et al.
bioRxiv · Apr 2025
6
RocketSHP: Ultra-fast Proteome-scale Prediction of Protein Dynamics
Samuel Sledzieski, Sonya M. Hanson
bioRxiv · Jun 2025
3
ESMDynamic: Fast and Accurate Prediction of Protein Dynamic Contact Maps from Single Sequences
D. Kleiman, Jiangyan Feng, Z. Xue, et al.
bioRxiv · Aug 2025
2

Citations

Total Citations8

Influential0

References92

GitHub

Stars60

Forks6

Open Issues3

Contributors2

Last Push5mo ago

LanguageJupyter Notebook

LicenseGPL-3.0

HuggingFace

Downloads79

Likes0

Last Modified5mo ago

Fields of citing research

Biology100%
Computer Science89%
Medicine67%
Chemistry11%
Materials Science11%
Physics11%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

84Open

Usability — can I run it?90

Reproducibility — can I retrain it?85

Model Openness Framework

Class II

Open Tooling

Resources

GitHub Repository Research Paper HuggingFace Model HuggingFace Model Dataset

Key Features

Dynamics-supervised pretraining: Both models are trained on biophysical properties extracted from MD trajectories and normal-mode analysis rather than masked-language modeling alone, capturing residue fluctuation and co-movement.

Two complementary models: SeqDance is trained from scratch on dynamics; ESMDance fine-tunes on a frozen ESM2-35M backbone, combining evolutionary and dynamic information.

Zero-shot mutation effects: ESMDance substantially outperforms ESM2 in zero-shot mutation-effect prediction for designed and viral proteins, which lack the deep evolutionary alignments that conventional PLMs rely on.

Conformational property prediction: SeqDance captures dynamic interaction patterns in unseen proteins and predicts global properties such as the radius of gyration for both intrinsically disordered regions and ordered proteins.

Open code and weights: Code is released on GitHub under GPL-3.0, with trained weights on Zenodo and HuggingFace model cards for both SeqDance and ESMDance.

Technical Details

Applications

Impact

Citations

Protein language models trained on biophysical dynamics inform mutation effects

Hou, C., et al. (2026) Protein language models trained on biophysical dynamics inform mutation effects. Proceedings of the National Academy of Sciences.

DOI: 10.1073/pnas.2530466123

Learning Biophysical Dynamics with Protein Language Models

Preprint

Hou, C., et al. (2024) Learning Biophysical Dynamics with Protein Language Models. bioRxiv.

DOI: 10.1101/2024.10.11.617911

SeqDance / ESMDance

#Key Features

#Technical Details

#Applications

#Impact

Citations

Protein language models trained on biophysical dynamics inform mutation effects

Learning Biophysical Dynamics with Protein Language Models

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

SeqDance / ESMDance

#Key Features

#Technical Details

#Applications

#Impact

Citations

Protein language models trained on biophysical dynamics inform mutation effects

Learning Biophysical Dynamics with Protein Language Models

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact