Contrastive antibody language model that predicts antibody-antigen binding specificity directly from amino acid sequence using a dual-encoder, cross-attentive architecture.
CALM-1.0 (Contrastive Antibody Language Model) is a machine-learning framework for predicting antibody–antigen binding specificity directly from amino acid sequences. Determining which antibody binds which antigen is a central problem in immunology and therapeutic antibody discovery, and traditional approaches rely on experimental screening or structural modeling. CALM-1.0 instead frames antibody–antigen recognition as a sequence-to-sequence "molecular translation" problem, learning to align cognate pairs in a shared representation space.
Developed by the Reddy lab at ETH Zurich and posted as a bioRxiv preprint in February 2026, CALM-1.0 couples a dual-encoder design—separate encoders for antibody and antigen sequences—with a cross-attentive decoder. Contrastive learning pulls true binding pairs together and pushes non-binders apart in the embedding space, allowing the model to score and retrieve likely partners in either direction (antibody-to-antigen or antigen-to-antibody).
Note that CALM-1.0 is distinct from the similarly named codon language model "CaLM"; here the acronym refers to a contrastive model of antibody–antigen specificity rather than codon-level representation learning.
CALM-1.0 is built on a dual-encoder plus cross-attentive decoder architecture trained with a contrastive learning objective that aligns cognate antibody–antigen pairs in a shared embedding space. The authors report training on 4,138 curated antibody–antigen pairs assembled from structural databases, and evaluate the model by retrieval on held-out sequences, reporting a mean top-1 retrieval rate of roughly 7% with performance demonstrated in both directions of prediction. The relatively small curated training set reflects the limited availability of paired antibody–antigen specificity data, which the contrastive formulation is designed to use efficiently.
CALM-1.0 is aimed at computational immunology and therapeutic antibody discovery, where identifying or prioritizing antibody–antigen pairs from sequence can accelerate candidate selection. By scoring and retrieving likely binding partners, it can help triage antibody repertoires against targets of interest, support epitope/partner hypothesis generation, and feed into broader antibody design pipelines—particularly in settings where structural data are unavailable.
CALM-1.0 contributes to the growing body of sequence-based antibody models by casting antibody–antigen specificity as a contrastive molecular-translation task with bidirectional retrieval. Its main current limitation is the scale of available paired training data: with a few thousand curated pairs and a modest top-1 retrieval rate, performance is an early proof of concept rather than a production tool. As a recent preprint, its results await peer review and independent benchmarking, and the framework would likely benefit substantially from larger curated specificity datasets.