A contrastive-learning model that predicts the first three Enzyme Commission (EC) digits for enzymes whose exact (fourth-level) function was never seen during training.
Automated enzyme function annotation typically frames the task as classification: given a protein sequence, assign one of a fixed set of Enzyme Commission (EC) numbers. This works when the enzyme's function is represented in the training data, but it forces an incorrect label onto enzymes whose true function was never seen, producing confidently wrong predictions for exactly the novel proteins biologists most want to characterize. EnzPlacer, from researchers at Iowa State University in a February 2026 bioRxiv preprint titled "How Not to be Seen," reframes the problem as placement rather than forced classification.
Instead of predicting a complete four-level EC number, EnzPlacer learns an embedding space in which a query sequence can be situated within a narrowed functional neighborhood. For an enzyme whose precise fourth-level EC class is absent from training, the model still predicts the first, second, and third EC digits—locating it within the correct broad functional context even when the exact reaction remains unknown. This makes the system robust to the open-world reality that most newly sequenced enzymes are not exact matches to characterized ones.
EnzPlacer maps 1280-dimensional ESM mean embeddings of protein sequences into a learned "EnzPlacer space" via contrastive learning, then assigns EC numbers by k-nearest-neighbor label transfer against a reference database of annotated enzymes. Inputs are FASTA sequences with precomputed ESM embeddings. The contrastive objective is designed so that the geometry of the embedding space reflects EC hierarchy, which is what allows partial (three-level) predictions for proteins whose exact function is out of distribution. The repository provides the model checkpoint, reference CSV, and precomputed embeddings (via Zenodo, DOI 10.5281/zenodo.18110452) along with evaluation splits that hold out unseen experimental enzymes at varying subsample rates (100%, 50%, 30%, 10%) to quantify generalization.
EnzPlacer is useful for functional annotation of newly sequenced or poorly-characterized proteins—for example, in metagenomic surveys, novel-organism genomes, or engineered enzyme libraries—where many sequences will not correspond to any characterized EC class. By returning a confident partial annotation instead of a forced full label, it gives biocurators and enzyme engineers a trustworthy functional bracket for prioritizing experimental characterization.
By explicitly modeling the open-world nature of enzyme annotation, EnzPlacer addresses a known failure mode of EC-classification tools, which tend to misassign genuinely novel enzymes. Its emphasis on honest partial predictions, together with publicly released weights and reference data, makes it a practical complement to existing contrastive annotation methods. As a February 2026 preprint, its quantitative standing relative to prior tools awaits peer review and independent benchmarking.