Protein foundation models
Protein

Protein Models

Protein sequence and structure prediction

217 models in this category

What protein foundation models do

Protein foundation models learn the evolutionary and physicochemical grammar encoded in amino acid sequences, enabling tasks from atomic-accuracy structure prediction to de novo sequence generation. They span a wide design space: sequence-only language models like ESM learn representations from hundreds of millions of natural proteins, while structure-informed models like AlphaFold and Boltz couple sequence to three-dimensional coordinates. Together they underpin modern drug discovery, enzyme engineering, and vaccine design.

Common applications and real-world use cases

The most adopted protein foundation models have become standard infrastructure in structural biology labs. AlphaFold reshaped the Protein Data Bank; ESM embeddings drive zero-shot mutational effect predictions that correlate with deep mutational scanning experiments; RFdiffusion and ProteinMPNN are used together to design binders and enzymes from scratch. Downstream benchmarks like ProteinGym and CASP provide shared ground truth for comparing models across fitness prediction, structure accuracy, and design tasks.

Notable Models

Top-rated protein models from our evaluations

AlphaFold 2

Google DeepMind

Released July 15, 2021

36.3K14.7K

AI system that predicts 3D protein structures from amino acid sequences with atomic accuracy. Won CASP14 with a median GDT score of 92.4.

Protein
61Openness

ESM-2 & ESMFold

Meta AI

Released July 20, 2022

4.7K1.5M4.1K

Meta AI's family of protein language models scaled to 15B parameters, paired with ESMFold for fast, alignment-free atomic-level structure prediction.

Protein
83Openness

Boltz-1

MIT

Released November 14, 2024

3744K

Open-source deep learning model for biomolecular structure prediction achieving AlphaFold3-level accuracy, trained entirely on publicly available data.

Protein
97Openness

AlphaFold 3

Google DeepMind

Released May 8, 2024

11.2K8.2K

Unified diffusion-based model predicting structures of protein complexes with nucleic acids, small molecules, ions, and modified residues with atomic accuracy.

Protein
28Openness

ESMC

Biohub

Released May 27, 2026

614.4K2.7K

Biohub's 2026 protein language model trained on ~2.8 billion sequences, forming the representation core of its world model of protein biology.

Protein
55Openness

Aiki-XP

Aikium

Released April 23, 2026

Leakage-controlled multimodal model predicting within-species relative protein expression across 385 bacterial species, with transfer to unseen phyla.

Protein
96Openness

Frequently asked questions

What is a protein foundation model?

A protein foundation model is a large neural network pretrained on vast corpora of amino acid sequences, structures, or both, learning representations that transfer to downstream tasks like structure prediction, function annotation, and sequence design. Unlike task-specific models, they generalize broadly across protein families and applications. Well-known examples include ESM, AlphaFold, and Boltz.

How do protein language models differ from structure predictors?

Protein language models like ESM are trained purely on sequence data and produce rich residue-level embeddings useful for fitness prediction and zero-shot variant scoring. Structure predictors like AlphaFold and Boltz additionally model the mapping from sequence to three-dimensional coordinates. Many modern pipelines combine both: embeddings for representation, structure prediction for geometry.

Are protein foundation models open source?

It varies considerably. ESM and ESMFold weights are openly released by EvolutionaryScale/Meta, and RFdiffusion code is available from the Baker Lab. AlphaFold weights are publicly available but under a CC BY license that restricts commercial redistribution. bio.rodeo's openness scores break down each model's licensing, weight access, and data transparency individually.

What benchmarks are used to evaluate protein foundation models?

Common benchmarks include CASP for structure prediction accuracy, ProteinGym for mutational fitness prediction, and FLIP for sequence-function fitness landscapes. For design tasks, wet-lab validation remains the gold standard, though computational proxies like pTM, pLDDT, and ESMFold structure recovery are widely reported.