bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

SeqDance / ESMDance

Columbia University

Protein language models trained on biophysical dynamics from MD simulations and normal-mode analysis; ESMDance fine-tunes ESM2 for strong zero-shot mutation-effect prediction.

Released: October 2024
Parameters: 35.2 Million

SeqDance and ESMDance are paired protein language models that learn from protein biophysical dynamics rather than only from evolutionary sequence patterns. Developed by Chao Hou, Haiqing Zhao, and Yufeng Shen at Columbia University, the work was posted to bioRxiv in October 2024 and published in PNAS as "Protein language models trained on biophysical dynamics inform mutation effects." Most protein language models—ESM2 being the canonical example—are trained to reconstruct masked residues from sequence context, which captures evolutionary signal but not the conformational motion that underlies protein function. SeqDance and ESMDance instead supervise on dynamic properties.

The training signal is derived from molecular dynamics (MD) trajectories and normal-mode analysis (NMA) of tens of thousands of proteins, yielding per-residue and pairwise descriptors of fluctuation and co-movement. The two models differ in initialization. SeqDance is trained from scratch on these dynamics, learning to represent motion using no prior evolutionary or structural information. ESMDance instead builds on the frozen ESM2-35M backbone and is trained on the same dynamics, fusing evolutionary representation with biophysical signal.

This dynamics-aware framing is complementary to ensemble samplers and structure predictors: where AlphaFold-style models predict a static fold and methods like AlphaFlow or BioEmu sample conformational ensembles, SeqDance and ESMDance embed dynamic behavior directly into sequence representations that can be queried zero-shot for downstream tasks.

#Key Features

  • Dynamics-supervised pretraining: Both models are trained on biophysical properties extracted from MD trajectories and normal-mode analysis rather than masked-language modeling alone, capturing residue fluctuation and co-movement.
  • Two complementary models: SeqDance is trained from scratch on dynamics; ESMDance fine-tunes on a frozen ESM2-35M backbone, combining evolutionary and dynamic information.
  • Zero-shot mutation effects: ESMDance substantially outperforms ESM2 in zero-shot mutation-effect prediction for designed and viral proteins, which lack the deep evolutionary alignments that conventional PLMs rely on.
  • Conformational property prediction: SeqDance captures dynamic interaction patterns in unseen proteins and predicts global properties such as the radius of gyration for both intrinsically disordered regions and ordered proteins.
  • Open code and weights: Code is released on GitHub under GPL-3.0, with trained weights on Zenodo and HuggingFace model cards for both SeqDance and ESMDance.

#Technical Details

Both models use a Transformer encoder architecture identical to ESM2-35M—12 layers, 20 attention heads per layer, an embedding dimension of 480, and roughly 35 million parameters. Training data are dynamic biophysical descriptors derived from MD trajectories and normal-mode analyses of tens of thousands of proteins (combining the MD and NMA sets spanning on the order of 64,000–65,000 proteins). SeqDance initializes parameters randomly and learns dynamics directly; ESMDance keeps the ESM2-35M weights frozen and learns to map their representations to dynamic properties. ESMDance's gains are most pronounced on designed and viral proteins, where ESM2's evolutionary signal is weak, demonstrating that the dynamics objective adds information beyond conservation.

#Applications

The models are useful for variant effect prediction and for analyzing protein flexibility when evolutionary information is limited—settings common in protein design and in studying fast-evolving viral proteins. ESMDance offers a drop-in, zero-shot scorer for mutation effects, while SeqDance provides representations and predictions of conformational properties (such as radius of gyration and dynamic contacts) for both ordered and disordered proteins, supporting researchers studying protein motion and stability.

#Impact

SeqDance and ESMDance show that supervising protein language models on biophysical dynamics, rather than evolution alone, yields representations that transfer to mutation- effect prediction—especially for designed and viral proteins that defeat conservation-based methods. Peer-reviewed publication in PNAS, together with openly released code and weights, makes the approach reproducible and positions dynamics-aware pretraining as a complement to both static structure predictors and conformational-ensemble samplers. The models are modest in scale (35M parameters), so a natural future direction is scaling the dynamics objective to larger backbones and broader protein families.

GitHub

Stars60
Forks7
Open Issues3
Contributors2
Last Push4mo ago
LanguageJupyter Notebook
LicenseGPL-3.0

HuggingFace

Downloads15
Likes0
Last Modified4mo ago

Openness

bio.rodeo opennessFully open · usable and reproducible
84Open
Usability — can I run it?90
Reproducibility — can I retrain it?85
Model Openness Framework
Class II
Open Tooling

Tags

variant_effect_predictionrepresentation_learningtransformerlanguage_modelself_supervisedzero_shotprotein_dynamics

Resources

GitHub RepositoryResearch PaperHuggingFace ModelHuggingFace ModelDataset