bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

EnzPlacer

Iowa State University

A contrastive-learning model that predicts the first three Enzyme Commission (EC) digits for enzymes whose exact (fourth-level) function was never seen during training.

Released: February 2026

Automated enzyme function annotation typically frames the task as classification: given a protein sequence, assign one of a fixed set of Enzyme Commission (EC) numbers. This works when the enzyme's function is represented in the training data, but it forces an incorrect label onto enzymes whose true function was never seen, producing confidently wrong predictions for exactly the novel proteins biologists most want to characterize. EnzPlacer, from researchers at Iowa State University in a February 2026 bioRxiv preprint titled "How Not to be Seen," reframes the problem as placement rather than forced classification.

Instead of predicting a complete four-level EC number, EnzPlacer learns an embedding space in which a query sequence can be situated within a narrowed functional neighborhood. For an enzyme whose precise fourth-level EC class is absent from training, the model still predicts the first, second, and third EC digits—locating it within the correct broad functional context even when the exact reaction remains unknown. This makes the system robust to the open-world reality that most newly sequenced enzymes are not exact matches to characterized ones.

#Key Features

  • Placement over forced classification: Locates a sequence within a known functional landscape rather than forcing an exact, possibly wrong, EC label.
  • Predicts unseen functions: Recovers the 1st, 2nd, and 3rd EC digits for enzymes whose 4th-level EC class was unseen during training.
  • Contrastive embedding space: Learns a representation in which functionally related enzymes cluster, enabling k-nearest-neighbor label transfer from a reference database.
  • Released model and data: A trained checkpoint, reference embeddings, and EC annotations are distributed via Zenodo under a GPL-3.0 license.

#Technical Details

EnzPlacer maps 1280-dimensional ESM mean embeddings of protein sequences into a learned "EnzPlacer space" via contrastive learning, then assigns EC numbers by k-nearest-neighbor label transfer against a reference database of annotated enzymes. Inputs are FASTA sequences with precomputed ESM embeddings. The contrastive objective is designed so that the geometry of the embedding space reflects EC hierarchy, which is what allows partial (three-level) predictions for proteins whose exact function is out of distribution. The repository provides the model checkpoint, reference CSV, and precomputed embeddings (via Zenodo, DOI 10.5281/zenodo.18110452) along with evaluation splits that hold out unseen experimental enzymes at varying subsample rates (100%, 50%, 30%, 10%) to quantify generalization.

#Applications

EnzPlacer is useful for functional annotation of newly sequenced or poorly-characterized proteins—for example, in metagenomic surveys, novel-organism genomes, or engineered enzyme libraries—where many sequences will not correspond to any characterized EC class. By returning a confident partial annotation instead of a forced full label, it gives biocurators and enzyme engineers a trustworthy functional bracket for prioritizing experimental characterization.

#Impact

By explicitly modeling the open-world nature of enzyme annotation, EnzPlacer addresses a known failure mode of EC-classification tools, which tend to misassign genuinely novel enzymes. Its emphasis on honest partial predictions, together with publicly released weights and reference data, makes it a practical complement to existing contrastive annotation methods. As a February 2026 preprint, its quantitative standing relative to prior tools awaits peer review and independent benchmarking.

Tags

enzyme_function_predictionec_number_predictionzero_shot_predictiontransformercontrastive_learningrepresentation_learningembeddingsenzymeproteomics