bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein

ProtLiD

National University of Singapore

A ligand-conditioned masked discrete diffusion model that co-designs protein sequence and structure under explicit small-molecule constraints.

Released: May 2026
Parameters: 370 Million

ProtLiD (Ligand-Conditioned Discrete Diffusion for Protein Sequence–Structure Co-Design) is a generative model that jointly produces an amino-acid sequence and a discrete structural representation for a protein, conditioned on a target small-molecule ligand. Designing proteins that bind a specified ligand requires sequence and structure to be mutually compatible while also satisfying the geometric and chemical constraints imposed by the ligand. ProtLiD addresses this by extending masked discrete diffusion, a paradigm that has worked well for sequence generation, into a ligand-aware setting that handles sequence and structure tokens together.

The model was introduced in a May 2026 arXiv preprint (arXiv:2605.27413) by Chen Wei, Fanding Xu, Minghao Sun, Zhiyuan Liu, Lin Wang, Tianrui Jia, Yihang Zhou, and Yang Zhang. The preprint does not list author affiliations; senior author Yang Zhang directs a well-known structural-biology and protein-modeling group at the National University of Singapore, so the organizational attribution here is inferred from that lab association rather than stated in the paper.

ProtLiD sits alongside ligand-aware design methods such as PocketGen and FAIR, but differs in treating sequence and discrete structure tokens within a single masked-diffusion generative process while injecting ligand chemistry and geometry through cross-attention. It targets both whole-protein design and binding-pocket co-design, where the surrounding scaffold is held fixed and the active site is generated to accommodate the ligand.

#Key Features

  • Ligand-conditioned co-design: Jointly generates amino-acid sequences and discrete structure tokens under explicit small-molecule constraints, rather than designing sequence and structure in separate stages.
  • Geometry-aware cross-attention: A 370M-parameter Transformer backbone incorporates both the chemical identity and the 3D geometry of the target ligand through cross-attention, grounding generation in the binding context.
  • Masked discrete diffusion: Extends the masked discrete diffusion framework, well established for sequence modeling, to the joint sequence–structure, ligand-aware setting.
  • ReMask self-correction at inference: A "maximum confidence-margin guided ReMask decoding" strategy retains high-confidence predictions while remasking and regenerating uncertain tokens during sampling.
  • Pocket and whole-protein modes: Supports both de novo whole-protein generation and binding-pocket co-design where the active site is regenerated around a fixed scaffold.

#Technical Details

ProtLiD is built on a 370M-parameter Transformer backbone trained on over one million ligand-protein complexes. Ligand information enters the model via geometry-aware cross-attention, and generation proceeds through masked discrete diffusion over joint sequence and structure tokens, with confidence-margin guided ReMask decoding applied at inference. On whole-protein design the authors report TM-score improving from 0.672 to 0.802 and pLDDT rising from 64.55 to 73.00. On pocket co-design, ProtLiD reaches an active-site backbone RMSD of 1.97 Å (versus 3.46 Å for FAIR and 3.40 Å for PocketGen) and a ligand-aware pass rate of 59.73% compared with 14.86% for the reported baseline. These figures are from the preprint and have not yet been independently benchmarked or peer-reviewed.

#Applications

ProtLiD is aimed at researchers designing proteins around a defined ligand, such as binders, sensors, and the active sites of enzymes or other functional proteins. The pocket co-design mode is particularly relevant for engineering or re-shaping a binding site to accommodate a chosen small molecule while keeping a known scaffold intact, a common task in protein and drug-discovery workflows. The reported gains in ligand-aware pass rate suggest the approach may produce a higher fraction of candidate designs consistent with the intended binding chemistry, which is valuable for prioritizing constructs before experimental validation.

#Impact

By framing ligand-conditioned protein design as joint sequence–structure masked discrete diffusion with geometry-aware conditioning, ProtLiD contributes to the fast-moving area of functional, ligand-aware protein generative models. Reported improvements over PocketGen and FAIR on pocket co-design metrics position it as a noteworthy entry, though its practical impact remains to be established: as of the May 2026 preprint the GitHub repository is a placeholder, with model weights and inference code announced for release in July–August 2026 under the Apache-2.0 license. Until then, the results stand as preprint claims awaiting independent reproduction and experimental confirmation.

Citation

Preprint

DOI: 10.48550/arXiv.2605.27413

DOI: 10.48550/arXiv.2605.27413

Openness

Unclassified
Restrictive license on core components

Tags

de_novo_designdiffusiongenerativeligand_bindingmultimodalprotein_designproteomicsstructure_predictiontransformer

Resources

GitHub RepositoryResearch Paper