Horizyn-1

Dual-encoder contrastive model that retrieves enzymes for query reactions by matching reaction fingerprints to protein sequence embeddings.

Released: March 2026

Horizyn-1 is a dual-encoder contrastive-learning model from Dayhoff Labs that matches enzymatic reactions to the proteins capable of catalyzing them. A large fraction of known biochemical reactions are "orphan" — they have no assigned enzyme — and conversely many sequenced proteins have unknown or only loosely assigned catalytic function. Horizyn-1 tackles this matching problem directly by learning a shared embedding space in which a reaction and its candidate enzymes sit close together, turning enzyme discovery into a retrieval task.

The model encodes reactions as chemical fingerprints and proteins as ProtT5-XL embeddings, then trains the two encoders contrastively on millions of reaction-enzyme pairs so that compatible pairs align in a 512-dimensional space. Given a query reaction, it ranks a database of proteins by predicted catalytic compatibility, reporting greater than 75% top-100 recall. Its primary published account appeared in PNAS in March 2026, with an earlier bioRxiv preprint, and unusually for this class of model it ships with open code and a hosted inference API.

Importantly, Horizyn-1 is built for reaction-to-enzyme retrieval and screening — not de novo sequence design — which distinguishes it from generative enzyme-design methods and from related discovery tools such as DISCO.

Key Features

Reaction-to-enzyme retrieval: Given a query reaction, ranks proteins by catalytic compatibility, achieving over 75% top-100 recall against large enzyme databases.
Dual-encoder contrastive design: A reaction encoder (RDKit and DRFP fingerprints through an MLP) and a protein encoder (ProtT5-XL embeddings through an MLP) are aligned via a Maximum Likelihood Noise Contrastive Estimation objective into 512-dim embeddings.
Experimentally grounded scope: Validated for orphan reactions, enzyme promiscuity, and non-natural reactions, including lysine transamination for non-canonical amino acids.
Few-shot adaptability: Fine-tuning on fewer than 10 examples improves performance on underrepresented reaction classes.
Predictable scaling: Performance scales logarithmically with training dataset size.
Open and hosted: Released with Python/PyTorch-Lightning code on GitHub and a hosted inference API at horizyn.dayhofflabs.com.

Technical Details

Horizyn-1 uses two MLP-based encoders trained to a shared 512-dimensional, L2-normalized embedding space. Reactions are represented by combined RDKit and DRFP structural fingerprints; proteins are represented by pre-computed ProtT5-XL transformer embeddings. The encoders are aligned with a Maximum Likelihood Noise Contrastive Estimation (MLNCE) loss over millions of reaction-enzyme pairs, so that retrieval reduces to nearest-neighbor search in the joint space. The authors report greater than 75% top-100 recall and logarithmic performance scaling with dataset size, and show that few-shot fine-tuning (fewer than 10 examples) recovers accuracy on underrepresented EC classes. The released implementation (PyTorch Lightning) provides command-line querying against an inference checkpoint (~402 MB) and requires roughly 16 GB of GPU VRAM; the code is distributed under the PolyForm Noncommercial License 1.0.0.

Applications

Horizyn-1 serves enzymologists, metabolic engineers, and biocatalysis researchers who need to find candidate enzymes for a reaction of interest — assigning function to orphan reactions, identifying promiscuous enzymes that may act on new substrates, and sourcing catalysts for non-natural transformations such as building non-canonical amino acids via lysine transamination. Because it is a retrieval tool rather than a generator, it fits naturally as a screening front end: rank a protein database for a target reaction, then take top hits forward to experimental testing. The hosted API and open code lower the barrier to integrating it into discovery pipelines.

Impact

By framing enzyme discovery as cross-modal retrieval between reaction fingerprints and protein language model embeddings, Horizyn-1 offers a scalable, experimentally validated route to closing the gap between cataloged reactions and the enzymes that run them. Its demonstrations on orphan reactions, promiscuity, and non-natural chemistry, together with publication in PNAS and a release that includes open code and a hosted API, make it a practically usable contribution rather than a benchmark-only result. The principal limitation is scope: it retrieves and screens existing proteins and does not design new enzyme sequences, and its code license restricts commercial use without a separate agreement.

Citation

Dual-encoder contrastive learning accelerates enzyme discovery

Rocks, J., et al. (2026) Dual-encoder contrastive learning accelerates enzyme discovery. bioRxiv.

DOI: 10.1073/pnas.2520070123

Recent citations

Papers that recently cited this model.

Accelerated development of 4-HPPD inhibitors using a hybrid deep learning approach with Bayesian-guided reinforcement learning
Junming Dong, Junyu Wu, Yunlong Li, et al.
Bioorganic chemistry (Print) · Sep 2026
0
Rethinking Benchmarks and Models for Enzyme Specificity Prediction
Elizabeth H. Mahood, N. Komorníková, Tom'avs Pluskal, et al.
Jul 2026
0
Searching for extraterrestrial life advances terrestrial sustainability
A. Howells, Catherine G. Fontana, Sabrina M. Elkassas, et al.
Nature Communications · Dec 2025
1

Top citations

The most-cited papers that cite this model.

Searching for extraterrestrial life advances terrestrial sustainability
A. Howells, Catherine G. Fontana, Sabrina M. Elkassas, et al.
Nature Communications · Dec 2025
1
Accelerated development of 4-HPPD inhibitors using a hybrid deep learning approach with Bayesian-guided reinforcement learning
Junming Dong, Junyu Wu, Yunlong Li, et al.
Bioorganic chemistry (Print) · Sep 2026
0
Rethinking Benchmarks and Models for Enzyme Specificity Prediction
Elizabeth H. Mahood, N. Komorníková, Tom'avs Pluskal, et al.
Jul 2026
0

Citations

Total Citations3

Influential0

References59

GitHub

Stars12

Forks1

Open Issues2

Contributors3

Last Push1mo ago

LanguagePython

Fields of citing research

Biology67%
Computer Science67%
Chemistry33%
Environmental Science33%
Medicine33%
Philosophy33%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

21Closed

Usability — can I run it?21

Reproducibility — can I retrain it?13

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Demo

Key Features

Reaction-to-enzyme retrieval: Given a query reaction, ranks proteins by catalytic compatibility, achieving over 75% top-100 recall against large enzyme databases.

Dual-encoder contrastive design: A reaction encoder (RDKit and DRFP fingerprints through an MLP) and a protein encoder (ProtT5-XL embeddings through an MLP) are aligned via a Maximum Likelihood Noise Contrastive Estimation objective into 512-dim embeddings.

Experimentally grounded scope: Validated for orphan reactions, enzyme promiscuity, and non-natural reactions, including lysine transamination for non-canonical amino acids.

Few-shot adaptability: Fine-tuning on fewer than 10 examples improves performance on underrepresented reaction classes.

Predictable scaling: Performance scales logarithmically with training dataset size.

Open and hosted: Released with Python/PyTorch-Lightning code on GitHub and a hosted inference API at horizyn.dayhofflabs.com.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Accelerated development of 4-HPPD inhibitors using a hybrid deep learning approach with Bayesian-guided reinforcement learning

Junming Dong, Junyu Wu, Yunlong Li, et al.

Bioorganic chemistry (Print) · Sep 2026

Rethinking Benchmarks and Models for Enzyme Specificity Prediction

Elizabeth H. Mahood, N. Komorníková, Tom'avs Pluskal, et al.

Jul 2026

Searching for extraterrestrial life advances terrestrial sustainability

A. Howells, Catherine G. Fontana, Sabrina M. Elkassas, et al.

Nature Communications · Dec 2025

Top citations

The most-cited papers that cite this model.

Searching for extraterrestrial life advances terrestrial sustainability

A. Howells, Catherine G. Fontana, Sabrina M. Elkassas, et al.

Nature Communications · Dec 2025

Accelerated development of 4-HPPD inhibitors using a hybrid deep learning approach with Bayesian-guided reinforcement learning

Junming Dong, Junyu Wu, Yunlong Li, et al.

Bioorganic chemistry (Print) · Sep 2026

Rethinking Benchmarks and Models for Enzyme Specificity Prediction

Elizabeth H. Mahood, N. Komorníková, Tom'avs Pluskal, et al.

Jul 2026

Horizyn-1

#Key Features

#Technical Details

#Applications

#Impact

Citation

Dual-encoder contrastive learning accelerates enzyme discovery

Recent citations

Rethinking Benchmarks and Models for Enzyme Specificity Prediction

Top citations

Rethinking Benchmarks and Models for Enzyme Specificity Prediction

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Horizyn-1

#Key Features

#Technical Details

#Applications

#Impact

Citation

Dual-encoder contrastive learning accelerates enzyme discovery

Recent citations

Rethinking Benchmarks and Models for Enzyme Specificity Prediction

Top citations

Rethinking Benchmarks and Models for Enzyme Specificity Prediction

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact