bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Protein

BioEmu-1

Microsoft

Generative deep learning model from Microsoft Research that emulates protein equilibrium ensembles at 100,000x the speed of molecular dynamics simulation.

Released: 2024

Overview

BioEmu-1 is a biomolecular emulator developed by Microsoft Research that addresses a longstanding challenge in structural biology: proteins are not static entities but dynamic ensembles of conformations, and capturing that equilibrium distribution is computationally intractable at scale using conventional simulation methods. Rather than predicting a single lowest-energy structure, BioEmu-1 generates diverse conformational ensembles that faithfully represent the equilibrium distribution of a protein in solution, including cryptic pockets, partially unfolded states, and large-scale domain rearrangements.

The model achieves this by training on three complementary data sources: static structures from the AlphaFold Database, over 200 milliseconds of cumulative molecular dynamics (MD) simulation data spanning thousands of proteins, and experimental protein folding stability measurements. This multi-source training strategy allows BioEmu-1 to learn both the geometric diversity of protein conformational space and the thermodynamic weighting of states within it. Released as a preprint in December 2024, BioEmu-1 represents a significant step toward replacing or augmenting traditional MD simulations for routine research tasks.

Key Features

  • Ultra-fast ensemble sampling: Generates over 1,000 statistically independent protein conformations per hour on a single GPU, approximately 100,000 times faster than conventional molecular dynamics simulations.
  • Thermodynamic accuracy: Predicts relative free energies with approximately 1 kcal/mol accuracy compared to millisecond-scale MD simulations and experimental folding stability data.
  • Functional motion capture: Samples diverse conformational changes including cryptic pocket formation, local unfolding events, and large-scale domain rearrangements that are invisible to static structure prediction.
  • Multi-source training: Integrates AlphaFold Database structures, 200+ milliseconds of MD trajectories (reweighted using Markov State Models for proper equilibrium distributions), and experimental stability measurements.
  • Mechanistic interpretability: Provides per-conformation structural data that can reveal causes of mutant destabilization, allosteric pathways, and binding site dynamics.

Technical Details

BioEmu-1 uses a diffusion-based generative architecture called DiG (Diffusion in Geometry), employing flow matching to learn the equilibrium distribution of protein conformations. Sequence information conditions the generative process, and structure generation proceeds through 30 to 50 denoising steps to produce high-quality conformational samples.

Training followed a multi-stage protocol. In the pretraining phase, the model was trained with denoising score matching on flexible protein structures from the AlphaFold Database. Fine-tuning then proceeded in two parallel tracks: additional denoising score matching on MD trajectories, and Property Prediction Fine-Tuning (PPFT) to align predicted ensemble thermodynamics with experimental folding free energies. MD trajectories were reweighted using Markov State Models to ensure the sampled conformations reflect true equilibrium populations rather than kinetic artifacts. The resulting model runs efficiently on single-GPU hardware, making ensemble generation accessible without high-performance computing infrastructure.

Applications

BioEmu-1 is particularly valuable in drug discovery contexts where transient or cryptic binding pockets must be identified — pockets that appear only in specific conformational states and are missed entirely by single-structure prediction tools like AlphaFold 2. Medicinal chemists can screen protein-ligand interactions across conformational ensembles to estimate affinity variation, and allosteric sites become visible through conformational clustering analysis. In protein engineering, BioEmu-1 supports rational design by predicting how mutations affect the stability landscape and sampling of functional states, enabling prioritization of variants before expensive wet-lab characterization. Structural biologists and computational researchers can use the model to generate hypotheses about folding mechanisms, study domain motion in multi-domain proteins, and benchmark or supplement traditional MD force fields.

Impact

BioEmu-1 represents one of the first generative models to combine statistical accuracy with the throughput required for proteome-scale conformational analysis. Its 100,000-fold speedup relative to MD simulation could substantially lower the barrier to studying protein dynamics for laboratories without access to dedicated simulation resources. The model's ability to accurately predict relative free energies — a long-standing benchmark challenge — demonstrates that deep learning can now meaningfully replace certain categories of physics-based simulation rather than merely complement them. As a preprint from December 2024, BioEmu-1 has not yet undergone formal peer review, and its performance on proteins outside the training distribution (e.g., membrane proteins, intrinsically disordered regions, or very large complexes) remains to be thoroughly characterized. The release of model weights and code on GitHub facilitates community evaluation and downstream development.

Citation

Scalable emulation of protein equilibrium ensembles with generative deep learning

Preprint

Lewis, S., Hempel, T., Jiménez-Luna, J., Gastegger, M., Xie, Y., Foong, A. Y. K., et al. (2024). Scalable emulation of protein equilibrium ensembles with generative deep learning. bioRxiv.

DOI: 10.1101/2024.12.05.626885

Metrics

GitHub

Stars794
Forks133
Open Issues2
Contributors11
Last Push9d ago
LanguagePython
LicenseMIT

Citations

Total Citations243
Influential28
References119

Tags

conformational ensemblemolecular dynamicsprotein dynamicsstructure predictionfoundation modelgenerative

Resources

GitHub RepositoryResearch Paper