National University of Singapore
A ligand-conditioned masked discrete diffusion model that co-designs protein sequence and structure under explicit small-molecule constraints.
ProtLiD (Ligand-Conditioned Discrete Diffusion for Protein Sequence–Structure Co-Design) is a generative model that jointly produces an amino-acid sequence and a discrete structural representation for a protein, conditioned on a target small-molecule ligand. Designing proteins that bind a specified ligand requires sequence and structure to be mutually compatible while also satisfying the geometric and chemical constraints imposed by the ligand. ProtLiD addresses this by extending masked discrete diffusion, a paradigm that has worked well for sequence generation, into a ligand-aware setting that handles sequence and structure tokens together.
The model was introduced in a May 2026 arXiv preprint (arXiv:2605.27413) by Chen Wei, Fanding Xu, Minghao Sun, Zhiyuan Liu, Lin Wang, Tianrui Jia, Yihang Zhou, and Yang Zhang. The preprint does not list author affiliations; senior author Yang Zhang directs a well-known structural-biology and protein-modeling group at the National University of Singapore, so the organizational attribution here is inferred from that lab association rather than stated in the paper.
ProtLiD sits alongside ligand-aware design methods such as PocketGen and FAIR, but differs in treating sequence and discrete structure tokens within a single masked-diffusion generative process while injecting ligand chemistry and geometry through cross-attention. It targets both whole-protein design and binding-pocket co-design, where the surrounding scaffold is held fixed and the active site is generated to accommodate the ligand.
ProtLiD is built on a 370M-parameter Transformer backbone trained on over one million ligand-protein complexes. Ligand information enters the model via geometry-aware cross-attention, and generation proceeds through masked discrete diffusion over joint sequence and structure tokens, with confidence-margin guided ReMask decoding applied at inference. On whole-protein design the authors report TM-score improving from 0.672 to 0.802 and pLDDT rising from 64.55 to 73.00. On pocket co-design, ProtLiD reaches an active-site backbone RMSD of 1.97 Å (versus 3.46 Å for FAIR and 3.40 Å for PocketGen) and a ligand-aware pass rate of 59.73% compared with 14.86% for the reported baseline. These figures are from the preprint and have not yet been independently benchmarked or peer-reviewed.
ProtLiD is aimed at researchers designing proteins around a defined ligand, such as binders, sensors, and the active sites of enzymes or other functional proteins. The pocket co-design mode is particularly relevant for engineering or re-shaping a binding site to accommodate a chosen small molecule while keeping a known scaffold intact, a common task in protein and drug-discovery workflows. The reported gains in ligand-aware pass rate suggest the approach may produce a higher fraction of candidate designs consistent with the intended binding chemistry, which is valuable for prioritizing constructs before experimental validation.
By framing ligand-conditioned protein design as joint sequence–structure masked discrete diffusion with geometry-aware conditioning, ProtLiD contributes to the fast-moving area of functional, ligand-aware protein generative models. Reported improvements over PocketGen and FAIR on pocket co-design metrics position it as a noteworthy entry, though its practical impact remains to be established: as of the May 2026 preprint the GitHub repository is a placeholder, with model weights and inference code announced for release in July–August 2026 under the Apache-2.0 license. Until then, the results stand as preprint claims awaiting independent reproduction and experimental confirmation.