A conditional denoising diffusion model that generates antigen-specific TCR CDR3β sequences conditioned on peptide-MHC targets and germline V-genes.
TCRDiff is a conditional denoising diffusion model for designing antigen-specific T-cell receptors (TCRs), developed at Monash University and released as a bioRxiv preprint in June 2026. It tackles one of the central inverse problems in adaptive immunity: given a disease-relevant target — a peptide presented on a major histocompatibility complex (peptide-MHC, or pMHC) — generate the receptor sequences most likely to recognize it. Because the binding interface is dominated by the hypervariable complementarity-determining region 3 of the TCR beta chain (CDR3β), TCRDiff focuses its generative effort there while conditioning on the broader sequence context.
Most prior work in this space has either predicted whether a given TCR-epitope pair binds (discriminative models such as TULIP) or generated candidate receptors with autoregressive language models. TCRDiff instead frames receptor design as iterative denoising: starting from noise, it progressively refines a CDR3β sequence under guidance from the target pMHC and the germline-encoded V-gene, producing diverse candidate repertoires for a specified antigen. This generative framing is well suited to the combinatorial breadth of TCR sequence space, where many distinct receptors can recognize the same epitope.
The model is pretrained on large unpaired T-cell repertoire databases and then specialized on curated TCR-pMHC recognition datasets, and the authors report experimental validation of designed receptors in vitro. Pretrained checkpoints and the underlying data are released on Zenodo, with code under a GPL-3.0 license.
TCRDiff is built from several interoperating components: a peptide language model and a peptide-MHC binding predictor that encode the target, a TCR diffusion language model that performs the generative denoising over CDR3β sequence space, and TCR-pMHC binding predictors used for scoring. These modules are combined into a conditional TCR diffusion model in which the target representation guides each denoising step. Pretraining draws on large T-cell repertoire datasets for general receptor sequence statistics, while specialization uses TCR-pMHC recognition datasets that pair receptors with their cognate antigens. At inference, a user supplies a peptide sequence, an MHC allele, V-gene choices (TRAV/TRBV) and organism, along with sampling parameters; the model emits candidate CDR3β sequences that can then be filtered by predicted binding. The released Zenodo record bundles roughly 536 MB of training and reference data alongside model checkpoints, and the implementation is distributed in Python.
TCRDiff is aimed at researchers developing T-cell-based therapeutics and probing antigen-specific immunity. By generating candidate receptors for a chosen pMHC target, it can seed the design of engineered TCRs for adoptive cell therapies and cancer immunotherapy, expand panels of antigen-specific receptors for vaccine and immune-monitoring studies, and supply diverse hypotheses for experimental screening. Because conditioning includes germline V-genes and MHC context, immunologists can tailor designs to particular receptor scaffolds and HLA backgrounds, and the integrated binding predictors let teams prioritize a tractable shortlist of candidates for wet-lab validation.
TCRDiff extends the diffusion-model paradigm that reshaped protein and antibody design into the distinctive setting of antigen-specific TCR engineering, where target-conditioned generation paired with binding prediction addresses a long-standing bottleneck in immunotherapy discovery. The release of pretrained checkpoints and training data on Zenodo under an open license lowers the barrier for other groups to build on the approach. As a recent preprint, its broader benchmarking against autoregressive and discriminative TCR methods and independent reproduction remain to be established, and — as with all in silico receptor design — generated candidates require experimental confirmation of binding and function before therapeutic use.
Zhang, Y., et al. (2026) Generative design of antigen-specific T-cell receptor sequences with a conditional diffusion model. openRxiv.
DOI: 10.64898/2026.06.10.730756Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data