Question 1

What is a protein foundation model?

Accepted Answer

A protein foundation model is a large neural network pretrained on vast corpora of amino acid sequences, structures, or both, learning representations that transfer to downstream tasks like structure prediction, function annotation, and sequence design. Unlike task-specific models, they generalize broadly across protein families and applications. Well-known examples include ESM, AlphaFold, and Boltz.

Question 2

How do protein language models differ from structure predictors?

Accepted Answer

Protein language models like ESM are trained purely on sequence data and produce rich residue-level embeddings useful for fitness prediction and zero-shot variant scoring. Structure predictors like AlphaFold and Boltz additionally model the mapping from sequence to three-dimensional coordinates. Many modern pipelines combine both: embeddings for representation, structure prediction for geometry.

Question 3

Are protein foundation models open source?

Accepted Answer

It varies considerably. ESM and ESMFold weights are openly released by EvolutionaryScale/Meta, and RFdiffusion code is available from the Baker Lab. AlphaFold weights are publicly available but under a CC BY license that restricts commercial redistribution. bio.rodeo's openness scores break down each model's licensing, weight access, and data transparency individually.

Question 4

What benchmarks are used to evaluate protein foundation models?

Accepted Answer

Common benchmarks include CASP for structure prediction accuracy, ProteinGym for mutational fitness prediction, and FLIP for sequence-function fitness landscapes. For design tasks, wet-lab validation remains the gold standard, though computational proxies like pTM, pLDDT, and ESMFold structure recovery are widely reported.

Protein Models

What protein foundation models do

Common applications and real-world use cases

Notable Models

AlphaFold 2

ESMC

Boltz-1

AlphaFold 3

ESM-2 & ESMFold

Dayhoff Atlas

Frequently asked questions

What is a protein foundation model?

How do protein language models differ from structure predictors?

Are protein foundation models open source?

What benchmarks are used to evaluate protein foundation models?

Explore related categories