Sequence-based discrete-diffusion framework that designs peptide binders with specified agonist or antagonist behavior against GPCR targets.
TD3B (Transition-Directed Discrete Diffusion for Allosteric Binder Generation) is a sequence-based generative framework for designing peptide binders that exert a specified functional effect — agonism or antagonism — on a target protein. Most generative binder design methods optimize for binding affinity alone, treating any tight binder as a success. TD3B instead conditions generation on the direction of the allosteric response, so that the resulting peptides are not merely binders but functional modulators that push the target toward an active or inactive conformational state. The work was developed by the Chatterjee Lab (Programmable Biology Group) at Duke University, spanning Biomedical Engineering and Computer Science, and was presented as an ICML 2026 Spotlight.
The model targets G protein-coupled receptors (GPCRs), a therapeutically central class of membrane proteins whose signaling is governed by ligand-induced conformational shifts. Designing peptides that selectively bias a GPCR toward activation or inhibition is a long-standing challenge because affinity and functional direction are decoupled: a high-affinity binder can be silent, agonistic, or antagonistic. TD3B addresses this by coupling a discrete-diffusion sequence generator with a learned signal that scores the intended directional effect.
Architecturally, TD3B fine-tunes a pretrained masked discrete language model (MDLM) diffusion backbone using an amortized training scheme that combines a Direction Oracle with a binding-affinity gate. The result is a single fixed checkpoint that runs inference against new protein targets without re-training, while optional per-target fine-tuning remains available for users who want to specialize the model further.
td3b.ckpt) runs inference on previously unseen protein targets without re-training; multi-target fine-tuning is optional.td3b.ckpt, the pretrained.ckpt MDLM backbone, and the direction_oracle.pt scorer — are distributed alongside an inference.py script and a Colab demo.TD3B builds on a masked diffusion language model (MDLM) backbone, a discrete-diffusion architecture that generates sequences by iteratively unmasking tokens under a learned reverse process. The pretrained backbone is provided as pretrained.ckpt; the specific training corpus used to pretrain this MDLM is not stated in the released materials. Fine-tuning is performed via amortized training that introduces transition-directed guidance: a Direction Oracle (direction_oracle.pt) predicts whether a candidate peptide drives the target toward the desired active or inactive state, and a binding-affinity gate filters for sequences that also bind, jointly shaping the diffusion trajectory. This produces the fixed td3b.ckpt checkpoint used for inference. The released package includes an inference.py script and a Colab notebook for generating binders against user-supplied targets. Model weights are distributed on HuggingFace under a CC BY-NC-ND 4.0 license (non-commercial, no-derivatives).
TD3B is aimed at researchers designing functional peptide modulators of GPCRs, where the goal is not only to bind a receptor but to elicit a defined pharmacological direction — activation or inhibition. Because the fixed checkpoint generalizes to new targets without re-training, it can be applied to a range of receptors as a first-pass in silico design tool, with optional fine-tuning for targets that warrant specialization. The bundled inference script and Colab demo lower the barrier for computational biologists and protein engineers to generate candidate sequences for downstream synthesis and experimental validation.
By explicitly conditioning generation on the direction of an allosteric response, TD3B reframes peptide binder design around functional outcome rather than affinity alone, a distinction that matters for therapeutic GPCR modulation. Its selection as an ICML 2026 Spotlight signals interest from the machine learning community in directional, function-aware generative design. Adoption is early and the broader functional generalization of designs remains to be established experimentally. Two limitations should be weighed: the non-commercial, no-derivatives license (CC BY-NC-ND 4.0) restricts commercial use and modification of the weights, and the training corpus of the underlying MDLM backbone is not disclosed, which limits full reproducibility and provenance assessment.