ESMFold2

Structure-prediction and design engine that turns ESMC sequence representations into all-atom 3D structures of proteins and biomolecular complexes.

Released: May 2026

ESMFold2 is the structure-prediction and design engine of Biohub's "world model of protein biology," released on May 27, 2026. Where the original ESMFold (Lin et al., 2023, then at Meta AI) showed that an evolutionary-scale protein language model could fold single sequences without multiple sequence alignments, ESMFold2 extends that lineage to atomically-resolved 3D structures of full biomolecular complexes — proteins together with DNA, RNA, small molecules, and modified residues. It is released alongside ESMC, the language model whose sequence representations it consumes, and the ESM Atlas of predicted structures.

The model is built and maintained by Biohub, the unified entity formed from CZI Science, CZ Biohub, and the acquired EvolutionaryScale team. Rather than learning structure directly from raw sequence, ESMFold2 translates the evolutionary patterns already encoded in ESMC's embeddings into all-atom coordinates, which lets it inherit the breadth of life that ESMC was trained on while focusing its own capacity on geometry. Its central architectural idea is a looped transformer that reuses the same blocks repeatedly, so the amount of compute spent on a target can be scaled at inference time rather than being fixed by the network depth.

Beyond folding existing molecules, ESMFold2 is positioned as a design engine: it was used to generate novel protein binders that were then validated experimentally against disease-relevant targets, with the binder search completing in days rather than the months or years typical of conventional campaigns.

Key Features

Looped transformer architecture: ESMFold2 repeatedly applies a shared block of weights, allowing inference-time compute to be scaled up for hard targets and reportedly reducing the overfitting seen in fixed-depth folding networks.
Built on ESMC representations: Rather than reading raw sequence, it consumes ESMC's learned embeddings, inheriting evolutionary signal spanning roughly 2.8 billion sequences across the tree of life.
Full biomolecular complexes: It predicts all-atom structures for proteins together with DNA, RNA, small molecules, and modified amino acids, emitting pLDDT, pAE, pTM, and ipTM confidence metrics.
State-of-the-art complex accuracy: On the FoldBench benchmark it meets or exceeds AlphaFold 3 on protein–protein and antibody–antigen complexes, predicting the true antibody–antigen binding pose more often than AlphaFold 3.
Experimentally validated design: Designed binders against five cancer and immunology targets reached hit rates of 36–88% for compact mini-binders and 15–29% for antibody-derived formats.
Open MIT license: Weights and code are freely available for commercial and non-commercial use through the Biohub platform and HuggingFace.

Technical Details

ESMFold2 is a transformer that operates on per-residue and pairwise representations derived from ESMC, with a looped trunk that iterates a shared set of blocks so that additional recycling passes can be traded for accuracy at inference time. It ships in two variants: the full ESMFold2 model, which can be conditioned on optional multiple sequence alignments, and ESMFold2-Fast, an inference-optimized single-sequence model. Both use a training-data cutoff of September 2021 and were trained on experimental structures from the Protein Data Bank together with predicted structures from the AlphaFold Database. Evaluation is reported on FoldBench, where ESMFold2 matches or surpasses AlphaFold 3 on antibody–antigen and general protein–protein complexes; comparisons in the accompanying preprint also place it favorably against Chai-1 (Chai Discovery) and Boltz-1 (MIT). Wet-lab validation across five disease targets — the receptor tyrosine kinases EGFR and PDGFRβ, the immune checkpoints PD-L1 and CTLA-4, and the signaling regulator CD45 — produced experimentally confirmed binders, with the design loop running in days.

Applications

ESMFold2 serves structural biologists and protein engineers who need accurate complex structures and de novo binders without depending on slow MSA construction or experimental scaffolds. Its complex-prediction accuracy makes it useful for antibody–antigen modeling, protein–protein interaction studies, and structure-based interpretation of biomolecular assemblies that include nucleic acids or small-molecule ligands. As a design engine it supports therapeutic discovery in oncology and immunology, where its demonstrated ability to produce validated mini-binders and antibody-format binders against checkpoint and receptor targets compresses early discovery timelines. The inference-time compute scaling lets users invest more computation in difficult or high-value targets.

Impact

ESMFold2 marks the structure-and-design pillar of Biohub's first integrated world model of protein biology, extending the original alignment-free ESMFold concept to full complexes and to generative binder design with experimental confirmation. Its open MIT release lowers the barrier to high-accuracy complex prediction and binder design, an area where comparable frontier capability has often sat behind closed commercial models. Reported parity with or advantage over AlphaFold 3 on antibody–antigen complexes is notable given how challenging that class has been for prior folding systems. The work is documented in a preprint, "Language Modeling Materializes a World Model of Protein Biology" (Chan Zuckerberg Biohub), posted to bioRxiv on June 4, 2026 (DOI 10.64898/2026.06.03.729735) under a CC-BY license; the headline benchmark and wet-lab numbers should be read as preprint results pending peer review and independent replication.

Citation

Language Modeling Materializes a World Model of Protein Biology

Candido, S., et al. (2026) Language Modeling Materializes a World Model of Protein Biology. bioRxiv.

DOI: 10.64898/2026.06.03.729735

Recent citations

Papers that recently cited this model.

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology
Hyunjin Seo, Hyeon Hwang, Gyubok Lee, et al.
Jul 2026
0
AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking
Zhiyuan Chen, Jing Hu, Junzhe Wang, et al.
Jul 2026
0
Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition
Rui Yin, S. Saravanakumar, Shu Yuan Shi, et al.
bioRxiv · Jul 2026
0

Top citations

The most-cited papers that cite this model.

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology
Hyunjin Seo, Hyeon Hwang, Gyubok Lee, et al.
Jul 2026
0
Systematic functional annotation of thousands of BAHD acyltransferases in plant genomes using Protein Language Model and phylogenomic tools
Nathaniel S. S. Smith, Xinyu Yuan, Chesney Melissinos, et al.
bioRxiv · Jun 2026
0
AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking
Zhiyuan Chen, Jing Hu, Junzhe Wang, et al.
Jul 2026
0
Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition
Rui Yin, S. Saravanakumar, Shu Yuan Shi, et al.
bioRxiv · Jul 2026
0
Folding, Reasoning, and Scaling with Open-source Drug Discovery Engine
Aureka AI OpenDDE project
Jul 2026
0Influential

Citations

Total Citations8

Influential2

References0

GitHub

Stars2.9K

Forks365

Open Issues83

Contributors22

Last Push3d ago

LanguageJupyter Notebook

HuggingFace

Downloads275.9K

Likes47

Last Modified1mo ago

Fields of citing research

Biology100%
Computer Science100%
Medicine50%
Chemistry13%
Environmental Science13%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

61Partial

Usability — can I run it?93

Reproducibility — can I retrain it?22

open weights, closed recipe

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website HuggingFace Model

Key Features

Looped transformer architecture: ESMFold2 repeatedly applies a shared block of weights, allowing inference-time compute to be scaled up for hard targets and reportedly reducing the overfitting seen in fixed-depth folding networks.

Built on ESMC representations: Rather than reading raw sequence, it consumes ESMC's learned embeddings, inheriting evolutionary signal spanning roughly 2.8 billion sequences across the tree of life.

Full biomolecular complexes: It predicts all-atom structures for proteins together with DNA, RNA, small molecules, and modified amino acids, emitting pLDDT, pAE, pTM, and ipTM confidence metrics.

State-of-the-art complex accuracy: On the FoldBench benchmark it meets or exceeds AlphaFold 3 on protein–protein and antibody–antigen complexes, predicting the true antibody–antigen binding pose more often than AlphaFold 3.

Experimentally validated design: Designed binders against five cancer and immunology targets reached hit rates of 36–88% for compact mini-binders and 15–29% for antibody-derived formats.

Open MIT license: Weights and code are freely available for commercial and non-commercial use through the Biohub platform and HuggingFace.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology

Hyunjin Seo, Hyeon Hwang, Gyubok Lee, et al.

Jul 2026

AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking

Zhiyuan Chen, Jing Hu, Junzhe Wang, et al.

Jul 2026

Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition

Rui Yin, S. Saravanakumar, Shu Yuan Shi, et al.

bioRxiv · Jul 2026

Top citations

The most-cited papers that cite this model.

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology

Hyunjin Seo, Hyeon Hwang, Gyubok Lee, et al.

Jul 2026

Systematic functional annotation of thousands of BAHD acyltransferases in plant genomes using Protein Language Model and phylogenomic tools

Nathaniel S. S. Smith, Xinyu Yuan, Chesney Melissinos, et al.

bioRxiv · Jun 2026

AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking

Zhiyuan Chen, Jing Hu, Junzhe Wang, et al.

Jul 2026

Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition

Rui Yin, S. Saravanakumar, Shu Yuan Shi, et al.

bioRxiv · Jul 2026

Folding, Reasoning, and Scaling with Open-source Drug Discovery Engine

Aureka AI OpenDDE project

Jul 2026

0Influential

ESMFold2

#Key Features

#Technical Details

#Applications

#Impact

Citation

Language Modeling Materializes a World Model of Protein Biology

Recent citations

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology

AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking

Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition

Top citations

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology

Systematic functional annotation of thousands of BAHD acyltransferases in plant genomes using Protein Language Model and phylogenomic tools

AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking

Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition

Folding, Reasoning, and Scaling with Open-source Drug Discovery Engine

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

ESMFold2

#Key Features

#Technical Details

#Applications

#Impact

Citation

Language Modeling Materializes a World Model of Protein Biology

Recent citations

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology

AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking

Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition

Top citations

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology

Systematic functional annotation of thousands of BAHD acyltransferases in plant genomes using Protein Language Model and phylogenomic tools

AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking

Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition

Folding, Reasoning, and Scaling with Open-source Drug Discovery Engine

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact