Ecole Normale Superieure
An unsupervised transformer language model for predicting TCR-epitope binding that generalizes to unseen epitopes by learning from incomplete immunological data.
TULIP (Transformer-based Unsupervised Language model for Interacting Peptides and T-cell receptors) is a model for predicting binding between T-cell receptors (TCRs) and their cognate peptide epitopes, developed by Barthelemy Meynard-Piganeau, Christoph Feinauer, Martin Weigt, Aleksandra Walczak, and Thierry Mora at the Laboratoire de Physique de l'Ecole Normale Superieure and Institut de Biologie Paris Seine. Introduced as a preprint in July 2023 and subsequently published in PNAS in 2024, TULIP addresses one of the most practically important and technically challenging problems in immunology: predicting which T-cell receptors will bind a given peptide-MHC complex.
Accurate TCR-epitope binding prediction is central to understanding adaptive immune responses, designing vaccines, and developing T-cell-based therapies including cancer immunotherapy. However, the field has been hampered by two compounding problems. First, training data is scarce relative to the vast diversity of both TCR and epitope sequence space. Second, prior supervised learning approaches require negative examples — pairs of TCRs and epitopes that do not bind — and the way these negatives are sampled introduces systematic biases that inflate apparent accuracy and cause models to fail when evaluated on epitopes not seen during training.
TULIP circumvents both limitations by adopting an unsupervised language model approach inspired by masked language modeling. Instead of learning a binary binding classifier, TULIP learns a joint probability distribution over TCR and epitope sequences using transformer attention. At inference time, the probability the model assigns to a TCR sequence given an epitope reflects the likelihood of binding, without ever requiring explicit negative examples. This framing is more robust to distribution shift across epitopes and enables the model to generalize to unseen epitopes — a critical requirement for real-world deployment.
TULIP is implemented as a transformer encoder that processes concatenated TCR and epitope sequence tokens. The model is trained on TCR-epitope pairs drawn from immunological databases including VDJdb, using a masked language modeling objective applied jointly to the TCR and epitope sequences. During training, residue positions in both sequences are randomly masked and the model is trained to predict the original amino acids from unmasked context, encouraging the model to learn the statistical dependencies between interacting sequences. Fine-tuning for a specific epitope uses the available positive examples for that epitope, with pretraining on the full positive dataset establishing general TCR-epitope co-occurrence statistics.
In benchmarks comparing TULIP to supervised methods including NetTCR-2.0, ERGO, and ERGO-II, TULIP shows robust performance on held-out epitopes not seen during training — the scenario most relevant to practical deployment. The model demonstrates that unsupervised approaches are less susceptible to the negative sampling bias identified in supervised models, which can artificially inflate AUC metrics when the test set epitopes overlap with training epitopes. The GitHub repository provides code for ranking TCR candidates for a given target epitope via the predict.py interface.
TULIP is applicable wherever researchers need to rank or filter T-cell receptors by their predicted binding affinity to a target peptide-MHC complex. In neoantigen vaccine design, it can prioritize candidate T-cell epitopes from patient tumor mutations by identifying which predicted neoantigen-specific TCRs are likely to be functional. In adoptive cell therapy research, TULIP supports identification of TCRs with desired specificity from large immune repertoire sequencing datasets. In basic immunology, the model's learned joint distribution provides insights into the sequence grammar of TCR-epitope recognition, complementing structural analyses of pMHC-TCR complexes. The model is also applicable to evaluating cross-reactivity risks — predicting whether a therapeutic TCR or CAR-T construct might engage off-target peptides.
TULIP's publication in PNAS established unsupervised language modeling as a viable and principled alternative to supervised TCR-epitope prediction, and provided the field with a concrete demonstration of the negative sampling bias problem in existing benchmarks. The model highlighted that generalization to unseen epitopes — not accuracy on previously observed epitopes — is the appropriate standard for evaluating TCR-epitope predictors in real-world settings. TULIP's approach influenced subsequent work on immunological sequence modeling and spurred broader discussion about evaluation protocols in the TCR-epitope prediction community. A recognized limitation, acknowledged in independent benchmarking studies, is that all current TCR-epitope models including TULIP struggle to substantially exceed random performance in the most stringent unseen-epitope evaluation scenarios, reflecting the fundamental difficulty of the problem and the need for richer structural and biophysical training signals.