High-PepBinder

Sequence-only latent diffusion model that designs target-specific peptide binders, cascaded with an affinity classifier through joint optimization.

Released: January 2026

High-PepBinder is a generative model for designing target-specific peptide binders directly from sequence, developed by researchers at Macao Polytechnic University and released as a bioRxiv preprint in January 2026. Therapeutic peptides occupy a valuable middle ground between small molecules and antibodies — they can engage protein surfaces with high specificity while remaining comparatively small and synthesizable — but designing peptides that bind a chosen target with high affinity remains difficult, and most computational approaches depend on 3D structures of the target or candidate complex.

High-PepBinder is sequence-only: it generates candidate peptide sequences for a given protein target without requiring structural input. It is built as a conditional latent diffusion model with a dual-encoder design, pairing a protein language model (pLM) with a diffusion process that operates in a learned latent space. Crucially, the generator is cascaded with an affinity classifier and the two components are trained through joint optimization, so the diffusion process is steered toward sequences predicted to bind the target tightly rather than merely toward plausible peptides.

To support training, the authors assembled PepPBA, a large peptide–protein binding-affinity dataset. The model is evaluated computationally against several therapeutically relevant targets, positioning it within the active field of sequence-based generative peptide design alongside structure-conditioned diffusion and flow-matching methods.

Key Features

Sequence-only binder generation: Designs target-specific peptide binders from sequence alone, removing the dependence on target or complex 3D structures that constrains many peptide-design pipelines.
Conditional latent diffusion with dual encoders: Combines a protein language model encoder with a diffusion model operating in a latent space, conditioning generation on the target to produce specific rather than generic peptides.
Affinity-guided joint optimization: Cascades the generator with an affinity classifier and trains them jointly, biasing sampled peptides toward high predicted binding affinity rather than relying on post-hoc filtering alone.
Purpose-built training data (PepPBA): Introduces a large peptide–protein binding-affinity dataset assembled to train and evaluate the model, addressing the scarcity of labeled affinity data for peptides.

Technical Details

High-PepBinder couples a protein language model with a conditional latent diffusion module in a dual-encoder architecture. Rather than diffusing over raw sequences, the model performs denoising diffusion in a learned latent space and decodes to peptide sequences, with the target protein supplied as the conditioning signal. A separate affinity classifier is cascaded onto the generator, and the generative and predictive components are optimized together so that the sampling trajectory is guided toward high-affinity binders. Training relies on PepPBA, a large peptide–protein binding-affinity dataset compiled by the authors. The model is assessed computationally on therapeutically important targets including KEAP1, XIAP, and EGFR. Two caveats should be noted: the validation is entirely in silico, with no reported wet-lab confirmation of binding, and the public availability of both the PepPBA dataset and model code/weights is unconfirmed as of the preprint, which currently limits independent reproduction.

Applications

High-PepBinder is aimed at researchers in peptide therapeutics and chemical biology who need candidate binders for specific protein targets but may lack reliable structural models of those targets. Potential uses include generating starting peptides for inhibitors of protein–protein interactions (such as the KEAP1–NRF2 axis or XIAP), producing target-directed peptide libraries for downstream experimental screening, and prioritizing sequences by predicted affinity before synthesis. Because it works from sequence alone, it is applicable to targets that are difficult to crystallize or model structurally, broadening the range of proteins amenable to computational peptide design.

Impact

High-PepBinder adds to the rapidly growing toolkit of generative models for peptide and protein-binder design, and its sequence-only, affinity-guided formulation offers a structure-independent alternative to the structure-conditioned diffusion and flow-matching approaches that dominate recent work. The accompanying PepPBA dataset, if released, could itself be a useful resource for the field given how scarce labeled peptide-affinity data is. The most important limitations are that the model has so far been validated only computationally — wet-lab confirmation of the designed binders is absent — and that the work is an unreviewed preprint without confirmed public code, weights, or dataset, so its real-world design success rate remains to be demonstrated.

Citation

High-PepBinder: A pLM-Guided Latent Diffusion Framework for Affinity-Aware Target-Specific Peptide Design

Mao, Q., et al. (2026) High-PepBinder: A pLM-Guided Latent Diffusion Framework for Affinity-Aware Target-Specific Peptide Design. bioRxiv.

DOI: 10.64898/2026.01.12.698988

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References53

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

4Closed

Usability — can I run it?7

Reproducibility — can I retrain it?0

not reproducible

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Sequence-only binder generation: Designs target-specific peptide binders from sequence alone, removing the dependence on target or complex 3D structures that constrains many peptide-design pipelines.

Conditional latent diffusion with dual encoders: Combines a protein language model encoder with a diffusion model operating in a latent space, conditioning generation on the target to produce specific rather than generic peptides.

Affinity-guided joint optimization: Cascades the generator with an affinity classifier and trains them jointly, biasing sampled peptides toward high predicted binding affinity rather than relying on post-hoc filtering alone.

Purpose-built training data (PepPBA): Introduces a large peptide–protein binding-affinity dataset assembled to train and evaluate the model, addressing the scarcity of labeled affinity data for peptides.

Technical Details

Applications

Impact

High-PepBinder

Key Features

Technical Details

Applications

Impact

Citation

High-PepBinder: A pLM-Guided Latent Diffusion Framework for Affinity-Aware Target-Specific Peptide Design

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

High-PepBinder

Key Features

Technical Details

Applications

Impact

Citation

High-PepBinder: A pLM-Guided Latent Diffusion Framework for Affinity-Aware Target-Specific Peptide Design

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

High-PepBinder

#Key Features

#Technical Details

#Applications

#Impact

Citation

High-PepBinder: A pLM-Guided Latent Diffusion Framework for Affinity-Aware Target-Specific Peptide Design

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

High-PepBinder

#Key Features

#Technical Details

#Applications

#Impact

Citation

High-PepBinder: A pLM-Guided Latent Diffusion Framework for Affinity-Aware Target-Specific Peptide Design

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact