ESM-1b (Evolutionary Scale Modeling) is a protein language model developed by Facebook AI Research (FAIR) that demonstrated for the first time that large-scale unsupervised learning on protein sequences alone is sufficient to encode biologically meaningful structural and functional information. Published in PNAS in April 2021, the work by Rives et al. showed that training a transformer encoder on 250 million diverse protein sequences causes structural properties — from local secondary structure to remote evolutionary relationships — to emerge as learnable features in the model's representations, without any explicit structural supervision.
The central insight of ESM-1b is that the statistical regularities encoded in evolutionary sequence data are rich enough to reconstruct biology at multiple scales. When the model is trained to predict masked amino acid residues from sequence context, it implicitly learns the co-evolutionary constraints that underpin protein folding. Linear probes applied to the resulting representations can recover secondary structure, contact maps, and remote homology with performance competitive with dedicated computational methods that use explicit multiple sequence alignments (MSAs).
ESM-1b established the proof of concept for the modern generation of protein language models. It belongs to the broader ESM family from Meta AI, which subsequently grew to include ESM-1v (zero-shot variant effect prediction), ESM-MSA-1b (MSA Transformer), ESM-2 (up to 15B parameters), and ESMFold (structure prediction from a single sequence), making it one of the most consequential foundational works in computational protein science.
ESM-1b is a 33-layer transformer encoder with a hidden dimension of 1,280, 20 attention heads, and feed-forward intermediate dimensions of 5,120, totaling approximately 650 million parameters. It was trained using a masked language modeling (MLM) objective — identical in spirit to BERT — where 15% of input amino acid tokens are masked and the model is trained to predict the original residue from the remaining sequence context. The training corpus was the UR50/S dataset, a high-diversity split of UniRef50 containing 250 million protein sequences representing 86 billion amino acids sampled from across the tree of life.
Benchmark evaluation showed ESM-1b achieving secondary structure Q3 accuracy of 71.6%, matching the performance of HMM-profile-based methods (71.2%) and exceeding published RaptorX results (70.6%) on the same CB513 benchmark — all from a linear probe with no fine-tuning of the backbone. Contact prediction from attention maps reached a precision at L/5 (P@L/5) of approximately 0.61 on the ProteinNet test set. Remote homology detection evaluated on the SCOP dataset confirmed that the model's representations encode fold-level similarity without explicit training on structural labels.
ESM-1b representations serve as general-purpose protein encoders for a wide range of downstream tasks. Researchers use the frozen embeddings as input features to lightweight prediction heads for secondary structure, solubility, stability, subcellular localization, and post-translational modification classification. In drug discovery and protein engineering workflows, ESM-1b embeddings provide a fast baseline for sequence-based screening of large libraries. The model's attention maps have been used for unsupervised contact prediction, aiding structure modeling of proteins lacking close homologs. ESM-1v, which shares ESM-1b's pretraining, demonstrated that the log-likelihood scoring from the language model is directly predictive of the effect of point mutations, enabling zero-shot variant effect prediction for clinical and engineering applications. The model is widely accessible through the HuggingFace Transformers library, lowering the barrier for wet-lab biologists to apply it without deep ML expertise.
ESM-1b was a defining paper in establishing protein language models as a serious paradigm in computational biology. It catalyzed a wave of subsequent work — including ProtTrans, ESM-2, and ProGen2 — and directly influenced how the field thinks about representation learning from sequence data alone. The ESM GitHub repository has accumulated thousands of stars and the model family has been cited extensively in the structural biology and protein engineering literature. A notable limitation of ESM-1b is that its fixed sinusoidal positional embeddings constrain generalization to sequences longer than those seen during training; this was addressed in ESM-2 with rotary position embeddings. Despite being superseded in raw performance by later ESM generations and competing models, ESM-1b remains a widely used baseline and a pedagogically important example of emergent representation learning in protein science.