Sequence-only conditional latent diffusion model that generates target-specific peptide binders, cascaded with an affinity classifier through joint optimization.
High-PepBinder is a generative model for designing target-specific peptide binders directly from sequence, developed by researchers at Macao Polytechnic University and released as a bioRxiv preprint in January 2026. Therapeutic peptides occupy a valuable middle ground between small molecules and antibodies — they can engage protein surfaces with high specificity while remaining comparatively small and synthesizable — but designing peptides that bind a chosen target with high affinity remains difficult, and most computational approaches depend on 3D structures of the target or candidate complex.
High-PepBinder is sequence-only: it generates candidate peptide sequences for a given protein target without requiring structural input. It is built as a conditional latent diffusion model with a dual-encoder design, pairing a protein language model (pLM) with a diffusion process that operates in a learned latent space. Crucially, the generator is cascaded with an affinity classifier and the two components are trained through joint optimization, so the diffusion process is steered toward sequences predicted to bind the target tightly rather than merely toward plausible peptides.
To support training, the authors assembled PepPBA, a large peptide–protein binding-affinity dataset. The model is evaluated computationally against several therapeutically relevant targets, positioning it within the active field of sequence-based generative peptide design alongside structure-conditioned diffusion and flow-matching methods.
High-PepBinder couples a protein language model with a conditional latent diffusion module in a dual-encoder architecture. Rather than diffusing over raw sequences, the model performs denoising diffusion in a learned latent space and decodes to peptide sequences, with the target protein supplied as the conditioning signal. A separate affinity classifier is cascaded onto the generator, and the generative and predictive components are optimized together so that the sampling trajectory is guided toward high-affinity binders. Training relies on PepPBA, a large peptide–protein binding-affinity dataset compiled by the authors. The model is assessed computationally on therapeutically important targets including KEAP1, XIAP, and EGFR. Two caveats should be noted: the validation is entirely in silico, with no reported wet-lab confirmation of binding, and the public availability of both the PepPBA dataset and model code/weights is unconfirmed as of the preprint, which currently limits independent reproduction.
High-PepBinder is aimed at researchers in peptide therapeutics and chemical biology who need candidate binders for specific protein targets but may lack reliable structural models of those targets. Potential uses include generating starting peptides for inhibitors of protein–protein interactions (such as the KEAP1–NRF2 axis or XIAP), producing target-directed peptide libraries for downstream experimental screening, and prioritizing sequences by predicted affinity before synthesis. Because it works from sequence alone, it is applicable to targets that are difficult to crystallize or model structurally, broadening the range of proteins amenable to computational peptide design.
High-PepBinder adds to the rapidly growing toolkit of generative models for peptide and protein-binder design, and its sequence-only, affinity-guided formulation offers a structure-independent alternative to the structure-conditioned diffusion and flow-matching approaches that dominate recent work. The accompanying PepPBA dataset, if released, could itself be a useful resource for the field given how scarce labeled peptide-affinity data is. The most important limitations are that the model has so far been validated only computationally — wet-lab confirmation of the designed binders is absent — and that the work is an unreviewed preprint without confirmed public code, weights, or dataset, so its real-world design success rate remains to be demonstrated.