bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

TCRDiff

Monash University

A conditional denoising diffusion model that generates antigen-specific TCR CDR3β sequences conditioned on peptide-MHC targets and germline V-genes.

Released: June 2026

TCRDiff is a conditional denoising diffusion model for designing antigen-specific T-cell receptors (TCRs), developed at Monash University and released as a bioRxiv preprint in June 2026. It tackles one of the central inverse problems in adaptive immunity: given a disease-relevant target — a peptide presented on a major histocompatibility complex (peptide-MHC, or pMHC) — generate the receptor sequences most likely to recognize it. Because the binding interface is dominated by the hypervariable complementarity-determining region 3 of the TCR beta chain (CDR3β), TCRDiff focuses its generative effort there while conditioning on the broader sequence context.

Most prior work in this space has either predicted whether a given TCR-epitope pair binds (discriminative models such as TULIP) or generated candidate receptors with autoregressive language models. TCRDiff instead frames receptor design as iterative denoising: starting from noise, it progressively refines a CDR3β sequence under guidance from the target pMHC and the germline-encoded V-gene, producing diverse candidate repertoires for a specified antigen. This generative framing is well suited to the combinatorial breadth of TCR sequence space, where many distinct receptors can recognize the same epitope.

The model is pretrained on large unpaired T-cell repertoire databases and then specialized on curated TCR-pMHC recognition datasets, and the authors report experimental validation of designed receptors in vitro. Pretrained checkpoints and the underlying data are released on Zenodo, with code under a GPL-3.0 license.

#Key Features

  • Conditional diffusion over CDR3β: TCRDiff casts receptor design as a denoising diffusion process, sampling antigen-specific CDR3β sequences from noise rather than left-to-right, which encourages diversity across the generated repertoire.
  • Target- and germline-aware conditioning: Generation is conditioned jointly on the peptide-MHC target and the germline V-gene (TRAV/TRBV), so designs respect both the antigen and the biophysical constraints of the receptor scaffold.
  • Integrated binding scoring: The system pairs the generative model with TCR-pMHC binding predictors, allowing generated candidates to be ranked and filtered by predicted recognition before downstream testing.
  • Pretrained then specialized: A repertoire-scale pretraining stage is followed by fine-tuning on TCR-pMHC recognition data, transferring general receptor sequence statistics into antigen-specific design.
  • Experimental validation: Designed receptors were assessed in vitro, and the authors report improved consistency with natural binding TCRs and structurally plausible TCR-pMHC complexes.

#Technical Details

TCRDiff is built from several interoperating components: a peptide language model and a peptide-MHC binding predictor that encode the target, a TCR diffusion language model that performs the generative denoising over CDR3β sequence space, and TCR-pMHC binding predictors used for scoring. These modules are combined into a conditional TCR diffusion model in which the target representation guides each denoising step. Pretraining draws on large T-cell repertoire datasets for general receptor sequence statistics, while specialization uses TCR-pMHC recognition datasets that pair receptors with their cognate antigens. At inference, a user supplies a peptide sequence, an MHC allele, V-gene choices (TRAV/TRBV) and organism, along with sampling parameters; the model emits candidate CDR3β sequences that can then be filtered by predicted binding. The released Zenodo record bundles roughly 536 MB of training and reference data alongside model checkpoints, and the implementation is distributed in Python.

#Applications

TCRDiff is aimed at researchers developing T-cell-based therapeutics and probing antigen-specific immunity. By generating candidate receptors for a chosen pMHC target, it can seed the design of engineered TCRs for adoptive cell therapies and cancer immunotherapy, expand panels of antigen-specific receptors for vaccine and immune-monitoring studies, and supply diverse hypotheses for experimental screening. Because conditioning includes germline V-genes and MHC context, immunologists can tailor designs to particular receptor scaffolds and HLA backgrounds, and the integrated binding predictors let teams prioritize a tractable shortlist of candidates for wet-lab validation.

#Impact

TCRDiff extends the diffusion-model paradigm that reshaped protein and antibody design into the distinctive setting of antigen-specific TCR engineering, where target-conditioned generation paired with binding prediction addresses a long-standing bottleneck in immunotherapy discovery. The release of pretrained checkpoints and training data on Zenodo under an open license lowers the barrier for other groups to build on the approach. As a recent preprint, its broader benchmarking against autoregressive and discriminative TCR methods and independent reproduction remain to be established, and — as with all in silico receptor design — generated candidates require experimental confirmation of binding and function before therapeutic use.

Citation

Generative design of antigen-specific T-cell receptor sequences with a conditional diffusion model

Zhang, Y., et al. (2026) Generative design of antigen-specific T-cell receptor sequences with a conditional diffusion model. openRxiv.

DOI: 10.64898/2026.06.10.730756

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References0

GitHub

Stars5
Forks1
Open Issues0
Contributors1
Last Push15d ago
LanguagePython
LicenseGPL-3.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
75Open
Usability — can I run it?81
Reproducibility — can I retrain it?74
Model Openness Framework
Class III
Open Model

Tags

protein_designde_novo_designdrug_discoverydiffusiontransformergenerativefoundation_modelantibody

Resources

GitHub RepositoryResearch PaperDataset