ProtLiD

370M-parameter ligand-conditioned discrete diffusion model that co-designs protein sequence and structure under explicit small-molecule constraints.

Released: May 2026

Parameters: 370 Million

ProtLiD (Ligand-Conditioned Discrete Diffusion for Protein Sequence–Structure Co-Design) is a generative model that jointly produces an amino-acid sequence and a discrete structural representation for a protein, conditioned on a target small-molecule ligand. Designing proteins that bind a specified ligand requires sequence and structure to be mutually compatible while also satisfying the geometric and chemical constraints imposed by the ligand. ProtLiD addresses this by extending masked discrete diffusion, a paradigm that has worked well for sequence generation, into a ligand-aware setting that handles sequence and structure tokens together.

The model was introduced in a May 2026 arXiv preprint (arXiv:2605.27413) by Chen Wei, Fanding Xu, Minghao Sun, Zhiyuan Liu, Lin Wang, Tianrui Jia, Yihang Zhou, and Yang Zhang. The preprint does not list author affiliations; senior author Yang Zhang directs a well-known structural-biology and protein-modeling group at the National University of Singapore, so the organizational attribution here is inferred from that lab association rather than stated in the paper.

ProtLiD sits alongside ligand-aware design methods such as PocketGen and FAIR, but differs in treating sequence and discrete structure tokens within a single masked-diffusion generative process while injecting ligand chemistry and geometry through cross-attention. It targets both whole-protein design and binding-pocket co-design, where the surrounding scaffold is held fixed and the active site is generated to accommodate the ligand.

Key Features

Ligand-conditioned co-design: Jointly generates amino-acid sequences and discrete structure tokens under explicit small-molecule constraints, rather than designing sequence and structure in separate stages.
Geometry-aware cross-attention: A 370M-parameter Transformer backbone incorporates both the chemical identity and the 3D geometry of the target ligand through cross-attention, grounding generation in the binding context.
Masked discrete diffusion: Extends the masked discrete diffusion framework, well established for sequence modeling, to the joint sequence–structure, ligand-aware setting.
ReMask self-correction at inference: A "maximum confidence-margin guided ReMask decoding" strategy retains high-confidence predictions while remasking and regenerating uncertain tokens during sampling.
Pocket and whole-protein modes: Supports both de novo whole-protein generation and binding-pocket co-design where the active site is regenerated around a fixed scaffold.

Technical Details

ProtLiD is built on a 370M-parameter Transformer backbone trained on over one million ligand-protein complexes. Ligand information enters the model via geometry-aware cross-attention, and generation proceeds through masked discrete diffusion over joint sequence and structure tokens, with confidence-margin guided ReMask decoding applied at inference. On whole-protein design the authors report TM-score improving from 0.672 to 0.802 and pLDDT rising from 64.55 to 73.00. On pocket co-design, ProtLiD reaches an active-site backbone RMSD of 1.97 Å (versus 3.46 Å for FAIR and 3.40 Å for PocketGen) and a ligand-aware pass rate of 59.73% compared with 14.86% for the reported baseline. These figures are from the preprint and have not yet been independently benchmarked or peer-reviewed.

Applications

ProtLiD is aimed at researchers designing proteins around a defined ligand, such as binders, sensors, and the active sites of enzymes or other functional proteins. The pocket co-design mode is particularly relevant for engineering or re-shaping a binding site to accommodate a chosen small molecule while keeping a known scaffold intact, a common task in protein and drug-discovery workflows. The reported gains in ligand-aware pass rate suggest the approach may produce a higher fraction of candidate designs consistent with the intended binding chemistry, which is valuable for prioritizing constructs before experimental validation.

Impact

By framing ligand-conditioned protein design as joint sequence–structure masked discrete diffusion with geometry-aware conditioning, ProtLiD contributes to the fast-moving area of functional, ligand-aware protein generative models. Reported improvements over PocketGen and FAIR on pocket co-design metrics position it as a noteworthy entry, though its practical impact remains to be established: as of the May 2026 preprint the GitHub repository is a placeholder, with model weights and inference code announced for release in July–August 2026 under the Apache-2.0 license. Until then, the results stand as preprint claims awaiting independent reproduction and experimental confirmation.

Citation

Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design

Preprint

Wei, C., et al. (2026) Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design. arXiv.

DOI: 10.48550/arXiv.2605.27413

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References44

GitHub

Stars6

Forks0

Open Issues1

Contributors1

Last Push1mo ago

LicenseApache-2.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

5Closed

Usability — can I run it?7

Reproducibility — can I retrain it?0

not reproducible

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Ligand-conditioned co-design: Jointly generates amino-acid sequences and discrete structure tokens under explicit small-molecule constraints, rather than designing sequence and structure in separate stages.

Geometry-aware cross-attention: A 370M-parameter Transformer backbone incorporates both the chemical identity and the 3D geometry of the target ligand through cross-attention, grounding generation in the binding context.

Masked discrete diffusion: Extends the masked discrete diffusion framework, well established for sequence modeling, to the joint sequence–structure, ligand-aware setting.

ReMask self-correction at inference: A "maximum confidence-margin guided ReMask decoding" strategy retains high-confidence predictions while remasking and regenerating uncertain tokens during sampling.

Pocket and whole-protein modes: Supports both de novo whole-protein generation and binding-pocket co-design where the active site is regenerated around a fixed scaffold.

Technical Details

Applications

Impact

ProtLiD

Key Features

Technical Details

Applications

Impact

Citation

Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

ProtLiD

Key Features

Technical Details

Applications

Impact

Citation

Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

ProtLiD

#Key Features

#Technical Details

#Applications

#Impact

Citation

Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

ProtLiD

#Key Features

#Technical Details

#Applications

#Impact

Citation

Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact