EnzPlacer

Enzyme function prediction model that uses contrastive learning to assign the first three EC digits to enzymes with functions unseen during training.

Released: February 2026

Automated enzyme function annotation typically frames the task as classification: given a protein sequence, assign one of a fixed set of Enzyme Commission (EC) numbers. This works when the enzyme's function is represented in the training data, but it forces an incorrect label onto enzymes whose true function was never seen, producing confidently wrong predictions for exactly the novel proteins biologists most want to characterize. EnzPlacer, from researchers at Iowa State University in a February 2026 bioRxiv preprint titled "How Not to be Seen," reframes the problem as placement rather than forced classification.

Instead of predicting a complete four-level EC number, EnzPlacer learns an embedding space in which a query sequence can be situated within a narrowed functional neighborhood. For an enzyme whose precise fourth-level EC class is absent from training, the model still predicts the first, second, and third EC digits—locating it within the correct broad functional context even when the exact reaction remains unknown. This makes the system robust to the open-world reality that most newly sequenced enzymes are not exact matches to characterized ones.

Key Features

Placement over forced classification: Locates a sequence within a known functional landscape rather than forcing an exact, possibly wrong, EC label.
Predicts unseen functions: Recovers the 1st, 2nd, and 3rd EC digits for enzymes whose 4th-level EC class was unseen during training.
Contrastive embedding space: Learns a representation in which functionally related enzymes cluster, enabling k-nearest-neighbor label transfer from a reference database.
Released model and data: A trained checkpoint, reference embeddings, and EC annotations are distributed via Zenodo under a GPL-3.0 license.

Technical Details

EnzPlacer maps 1280-dimensional ESM mean embeddings of protein sequences into a learned "EnzPlacer space" via contrastive learning, then assigns EC numbers by k-nearest-neighbor label transfer against a reference database of annotated enzymes. Inputs are FASTA sequences with precomputed ESM embeddings. The contrastive objective is designed so that the geometry of the embedding space reflects EC hierarchy, which is what allows partial (three-level) predictions for proteins whose exact function is out of distribution. The repository provides the model checkpoint, reference CSV, and precomputed embeddings (via Zenodo, DOI 10.5281/zenodo.18110452) along with evaluation splits that hold out unseen experimental enzymes at varying subsample rates (100%, 50%, 30%, 10%) to quantify generalization.

Applications

EnzPlacer is useful for functional annotation of newly sequenced or poorly-characterized proteins—for example, in metagenomic surveys, novel-organism genomes, or engineered enzyme libraries—where many sequences will not correspond to any characterized EC class. By returning a confident partial annotation instead of a forced full label, it gives biocurators and enzyme engineers a trustworthy functional bracket for prioritizing experimental characterization.

Impact

By explicitly modeling the open-world nature of enzyme annotation, EnzPlacer addresses a known failure mode of EC-classification tools, which tend to misassign genuinely novel enzymes. Its emphasis on honest partial predictions, together with publicly released weights and reference data, makes it a practical complement to existing contrastive annotation methods. As a February 2026 preprint, its quantitative standing relative to prior tools awaits peer review and independent benchmarking.

Citation

How Not to be Seen: Predicting Unseen Enzyme Functions using Contrastive Learning

Ma, X., et al. (2026) How Not to be Seen: Predicting Unseen Enzyme Functions using Contrastive Learning. bioRxiv.

DOI: 10.64898/2026.02.23.707489

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References31

GitHub

Stars0

Forks1

Open Issues0

Contributors1

Last Push12d ago

LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

59Partial

Usability — can I run it?77

Reproducibility — can I retrain it?49

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper Dataset

Key Features

Placement over forced classification: Locates a sequence within a known functional landscape rather than forcing an exact, possibly wrong, EC label.

Predicts unseen functions: Recovers the 1st, 2nd, and 3rd EC digits for enzymes whose 4th-level EC class was unseen during training.

Contrastive embedding space: Learns a representation in which functionally related enzymes cluster, enabling k-nearest-neighbor label transfer from a reference database.

Released model and data: A trained checkpoint, reference embeddings, and EC annotations are distributed via Zenodo under a GPL-3.0 license.

Technical Details

Applications

Impact

EnzPlacer

Key Features

Technical Details

Applications

Impact

Citation

How Not to be Seen: Predicting Unseen Enzyme Functions using Contrastive Learning

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

EnzPlacer

Key Features

Technical Details

Applications

Impact

Citation

How Not to be Seen: Predicting Unseen Enzyme Functions using Contrastive Learning

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

EnzPlacer

#Key Features

#Technical Details

#Applications

#Impact

Citation

How Not to be Seen: Predicting Unseen Enzyme Functions using Contrastive Learning

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

EnzPlacer

#Key Features

#Technical Details

#Applications

#Impact

Citation

How Not to be Seen: Predicting Unseen Enzyme Functions using Contrastive Learning

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact