bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

enzyme-SFM

ETH Zurich

Specificity Foundation Model that predicts enzyme-substrate specificity from sequence using a physics-derived dual-encoder with symmetric contrastive learning.

Released: June 2026

enzyme-SFM is a Specificity Foundation Model (SFM) for predicting enzyme–substrate specificity directly from sequence. Knowing which substrate an enzyme acts on is central to metabolic engineering, biocatalysis, and functional annotation, yet most enzymes lack experimentally characterized substrate profiles. enzyme-SFM frames enzyme–substrate matching as a cross-modal retrieval problem, learning to align cognate enzyme–substrate pairs in a shared representation space so that likely partners can be scored from sequence and molecular structure alone.

Developed by the Reddy lab at ETH Zurich and posted as a bioRxiv preprint in June 2026, enzyme-SFM is one of six models in the SFM family, all built on a single, physics-derived dual-encoder architecture. It is the sequel to CALM-1, the antibody–antigen specificity model from the same group, generalizing that contrastive molecular-recognition recipe from immune binding to enzymatic catalysis.

The model encodes enzyme and substrate representations with separate encoders and aligns them using a symmetric contrastive objective, pulling true catalytic pairs together and pushing non-substrates apart. This formulation lets enzyme-SFM transfer knowledge across diverse enzyme families into zero-shot predictions for held-out enzymes and substrates.

#Key Features

  • Sequence-to-specificity prediction: Predicts enzyme–substrate pairing from sequence and substrate representation, without requiring an experimentally measured activity profile.
  • Physics-derived dual-encoder: Encodes enzyme and substrate separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.
  • Symmetric contrastive learning: Aligns cognate enzyme–substrate pairs in a shared embedding space, enabling retrieval in either direction (enzyme-to-substrate or substrate-to-enzyme).
  • Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.
  • Zero-shot cross-modal retrieval: Generalizes to unseen enzymes and substrates without task-specific fine-tuning.

#Technical Details

enzyme-SFM uses the shared SFM architecture: a physics-derived dual-encoder trained with a symmetric contrastive objective and a learned Boltzmann temperature that calibrates similarity scores. The two encoders embed enzyme and substrate independently, and the contrastive loss aligns cognate pairs while separating mismatches. The model is pretrained on public enzyme–substrate specificity data and evaluated by zero-shot cross-modal retrieval on held-out pairs, where it reports strong top-k retrieval performance—mirroring the benchmarks used across the SFM family for measuring how reliably a model recovers true catalytic partners.

#Applications

enzyme-SFM is aimed at biocatalysis, metabolic engineering, and enzyme functional annotation, where predicting substrate scope from sequence can accelerate enzyme selection and pathway design. By scoring and retrieving likely substrates for an enzyme—or candidate enzymes for a target reaction—it can help triage large enzyme libraries, propose biosynthetic routes, and complement experimental activity screening where wet-lab characterization is costly.

#Impact

enzyme-SFM extends the contrastive specificity-prediction paradigm established by CALM-1 from antibody–antigen recognition to enzyme–substrate recognition, demonstrating that a single physics-derived dual-encoder recipe transfers across molecular domains. As one of six SFMs released together, it contributes evidence that cross-modal contrastive learning is a general tool for biological specificity prediction. Its main current limitations are those of a recent preprint: results await peer review and independent benchmarking, and at the time of release no public code or weights repository was available, so reproduction depends on forthcoming artifact releases.

Citation

Vibe Coding Specificity Foundation Models

Reddy, S. T. (2026) Vibe Coding Specificity Foundation Models. bioRxiv.

DOI: 10.64898/2026.06.04.730134

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References52

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
23Closed
Usability — can I run it?15
Reproducibility — can I retrain it?18
Model Openness Framework
Unclassified
Missing required components

Tags

binding_predictioncontrastive_learningcross_modal_retrievaldual_encoderenzymefoundation_modelproteomicstransformerzero_shot

Resources

Research Paper