bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
ProteinSmall molecule

EnzyPGM

University of Science and Technology of China / Nanjing University

Pocket-conditioned generative model that jointly designs enzyme sequences and substrate-binding pockets conditioned on functional priors and substrate structure.

Released: January 2026

EnzyPGM (Pocket-conditioned Generative Model) is a unified generative framework for substrate-specific enzyme design, released as a preprint in January 2026 by researchers at the University of Science and Technology of China and Nanjing University. Rather than treating enzyme sequence design and binding-pocket geometry as separate problems, EnzyPGM jointly generates the enzyme and its substrate-binding pocket, conditioned on functional priors (such as enzyme commission class) and the structure of the target substrate. The goal is to produce enzymes whose active sites are tailored to catalyze a chosen reaction, addressing one of the hardest aspects of computational enzyme engineering: shaping the pocket so that it correctly recognizes and positions a specific substrate.

EnzyPGM sits in a fast-moving area of generative biology focused on functional protein design, alongside reaction- and substrate-conditioned methods such as EnzyGen, EnzymeFlow, and GENzyme. Its distinguishing emphasis is fine-grained modeling of the interaction between pocket residues and individual substrate atoms, which the authors argue is essential for designing enzymes with genuine substrate specificity rather than generic binding.

The framework is paired with EnzyPock, a curated dataset the authors built to train and benchmark substrate-specific enzyme design. By jointly conditioning on function and substrate geometry, EnzyPGM aims to generate catalytically plausible enzyme-pocket pairs in a single pass rather than relying on separate scaffolding and docking stages.

#Key Features

  • Joint enzyme and pocket generation: Generates the enzyme sequence and its substrate-binding pocket together, conditioned on functional priors and the target substrate, rather than designing sequence and active site independently.
  • Residue-atom Bi-scale Attention (RBA): An attention mechanism that jointly models intra-residue dependencies and fine-grained interactions between pocket residues and individual substrate atoms, capturing the geometry of substrate recognition.
  • Residue Function Fusion (RFF): A module that injects enzyme function priors (e.g., EC-class information) into residue representations so that generation is guided toward the intended catalytic function.
  • EnzyPock dataset: An author-curated benchmark of 83,062 enzyme-substrate pairs spanning 1,036 four-level EC families, providing structured supervision for substrate-specific design.

#Technical Details

EnzyPGM is a conditional generative model that integrates a residue-level representation of the enzyme with an atom-level representation of the substrate. Its Residue-atom Bi-scale Attention (RBA) operates across these two scales simultaneously, modeling both dependencies among enzyme residues and the detailed contacts between pocket residues and substrate atoms; the Residue Function Fusion (RFF) module conditions generation on functional priors so the designed enzyme matches a target EC family. The model is trained and evaluated on EnzyPock, which the authors assembled from 83,062 enzyme-substrate pairs across 1,036 four-level EC families.

On the EnzyPock benchmark, EnzyPGM reports state-of-the-art results and, notably, an average improvement of 0.47 kcal/mol in substrate binding energy over EnzyGen, a prior substrate-guided enzyme generator. As with most preprints in this space, evaluation is conducted primarily on the authors' own benchmark and is computational; experimental (wet-lab) validation is not reported. The authors state that the code and the EnzyPock dataset will be released later, but neither was publicly available at the time of writing, and no license has been specified — both reproducibility caveats worth noting.

#Applications

EnzyPGM targets computational enzyme engineering, where the goal is to obtain enzymes that catalyze a chosen reaction on a specified substrate. By co-designing the sequence and the binding pocket, it is suited to generating candidate enzymes for industrial biocatalysis, green chemistry, and metabolic-pathway engineering, where matching an enzyme's active site to a non-native or specific substrate is a central challenge. Protein engineers could use it to propose starting points for directed evolution or to explore the space of plausible active-site configurations for a target reaction, with the important caveat that generated designs require experimental validation before use.

#Impact

EnzyPGM advances the substrate-conditioned enzyme design literature by making pocket-substrate interaction modeling explicit at the residue-atom scale and by reporting measurable binding-energy gains over EnzyGen on a sizeable EC-stratified benchmark. Its accompanying EnzyPock dataset, if released as promised, could serve as a shared resource for the growing community working on generative enzyme design. The model's near-term impact is tempered by its reliance on a single author-built benchmark, the absence of wet-lab validation, and the fact that code, dataset, and weights had not yet been released — so independent reproduction and real-world catalytic validation remain the key open steps.

Openness

bio.rodeo opennessClosed · low usability and reproducibility
23Closed
Usability — can I run it?18
Reproducibility — can I retrain it?32
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

enzyme_designprotein_designde_novo_designtransformergraph_neural_networkgenerativemultimodalproteomics

Resources

Research Paper