drug-SFM

Specificity foundation model predicting small-molecule drug-target binding from sequence, scored as cross-modal retrieval without docking or assays.

Released: June 2026

drug-SFM is a Specificity Foundation Model (SFM) for predicting small-molecule drug–target protein specificity directly from molecular and sequence representations. Determining which proteins a compound binds underpins drug discovery, polypharmacology analysis, and off-target toxicity assessment, yet experimental profiling is expensive and incomplete. drug-SFM frames drug–target matching as a cross-modal retrieval problem, learning to align cognate compound–protein pairs in a shared representation space so that likely interactions can be scored without docking or assay data.

Developed by the Reddy lab at ETH Zurich and posted as a bioRxiv preprint in June 2026, drug-SFM is one of six models in the SFM family, all built on a single, physics-derived dual-encoder architecture. It is the sequel to CALM-1, the antibody–antigen specificity model from the same group, generalizing that contrastive molecular-recognition recipe from immune binding to small-molecule pharmacology. The drug–target instance is designated dtSFM, and a dedicated preprint pairs its encoder with a generative decoder to demonstrate off-target prediction, drug repurposing, and de novo molecule design from a single frozen checkpoint.

The model encodes small molecules and target proteins with separate encoders and aligns them using a symmetric contrastive objective, pulling true binding pairs together and pushing non-binders apart. This formulation lets drug-SFM transfer knowledge across chemical and protein space into zero-shot predictions for held-out compounds and targets.

Key Features

Sequence-to-specificity prediction: Predicts drug–target binding from molecular and protein representations alone, without requiring docking or experimental affinity data.
Physics-derived dual-encoder: Encodes small molecule and target protein separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.
Symmetric contrastive learning: Aligns cognate drug–target pairs in a shared embedding space, enabling retrieval in either direction (drug-to-target or target-to-drug).
Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.
Zero-shot cross-modal retrieval: Generalizes to unseen compounds and protein targets without task-specific fine-tuning.
Generative molecule decoder: A cross-attentive decoder designs novel target-conditioned candidate molecules, extending the model from scoring known compounds to de novo generation.

Technical Details

drug-SFM uses the shared SFM architecture: a physics-derived dual-encoder trained with a symmetric contrastive objective and a learned Boltzmann temperature that calibrates similarity scores. The two encoders embed the small molecule and the target protein independently, and the contrastive loss aligns cognate pairs while separating mismatches. The model is pretrained on public drug–target specificity data and evaluated by zero-shot cross-modal retrieval on held-out pairs, where it reports strong top-k retrieval performance—mirroring the benchmarks used across the SFM family for measuring how reliably a model recovers true binding partners. The dedicated dtSFM preprint reports training on 714,747 measured drug–protein interactions and retrieval of a drug's target and a target's drug at 95% and 89% recall-at-10 in distribution. Its cross-attentive decoder generated novel molecules for 16 targets, with 850 of 1,200 (71%) designed candidates matching the AlphaFold 3 structural confidence of the corresponding approved drugs, and library-scale repurposing ranked a 522,776-compound library against clinical targets.

Applications

drug-SFM is aimed at drug discovery and chemical biology, where predicting target engagement from structure can accelerate hit identification, target deconvolution, and off-target risk assessment. By scoring and retrieving likely protein targets for a compound—or candidate ligands for a target—it can help prioritize compounds for screening, map polypharmacology profiles, and complement experimental binding assays in early-stage discovery pipelines. The dedicated dtSFM workflow further supports drug repurposing by ranking large compound libraries against clinical targets and de novo generation of target-conditioned candidate molecules.

Impact

drug-SFM extends the contrastive specificity-prediction paradigm established by CALM-1 from antibody–antigen recognition to small-molecule drug–target recognition, demonstrating that a single physics-derived dual-encoder recipe transfers across molecular domains. As one of six SFMs released together, it contributes evidence that cross-modal contrastive learning is a general tool for biological specificity prediction. Its main current limitations are those of a recent preprint: results await peer review and independent benchmarking, and at the time of release no public code or weights repository was available, so reproduction depends on forthcoming artifact releases.

Citations

Vibe Coding Specificity Foundation Models

Reddy, S. T. (2026) Vibe Coding Specificity Foundation Models. bioRxiv.

DOI: 10.64898/2026.06.04.730134

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Reddy, S. T. (2026) A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design. bioRxiv.

DOI: 10.64898/2026.06.08.730844

Recent citations

Papers that recently cited this model.

Generative Drug Design in a Loop with dtSFM
Sai T. Reddy
bioRxiv · Jun 2026
0Influential
A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design
Sai T. Reddy
bioRxiv · Jun 2026
0

Top citations

The most-cited papers that cite this model.

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design
Sai T. Reddy
bioRxiv · Jun 2026
0
Generative Drug Design in a Loop with dtSFM
Sai T. Reddy
bioRxiv · Jun 2026
0Influential

Citations

Total Citations1

Influential0

References46

Fields of citing research

Biology100%
Computer Science100%
Medicine100%
Chemistry50%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

16Closed

Usability — can I run it?9

Reproducibility — can I retrain it?12

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper Research Paper

Key Features

Sequence-to-specificity prediction: Predicts drug–target binding from molecular and protein representations alone, without requiring docking or experimental affinity data.

Physics-derived dual-encoder: Encodes small molecule and target protein separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.

Symmetric contrastive learning: Aligns cognate drug–target pairs in a shared embedding space, enabling retrieval in either direction (drug-to-target or target-to-drug).

Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.

Zero-shot cross-modal retrieval: Generalizes to unseen compounds and protein targets without task-specific fine-tuning.

Generative molecule decoder: A cross-attentive decoder designs novel target-conditioned candidate molecules, extending the model from scoring known compounds to de novo generation.

Technical Details

Applications

Impact

Citations

Vibe Coding Specificity Foundation Models

Reddy, S. T. (2026) Vibe Coding Specificity Foundation Models. bioRxiv.

DOI: 10.64898/2026.06.04.730134

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Reddy, S. T. (2026) A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design. bioRxiv.

DOI: 10.64898/2026.06.08.730844

drug-SFM

Key Features

Technical Details

Applications

Impact

Citations

Vibe Coding Specificity Foundation Models

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Recent citations

Generative Drug Design in a Loop with dtSFM

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Top citations

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Generative Drug Design in a Loop with dtSFM

Citations

Fields of citing research

Openness

Tags

Resources

drug-SFM

Key Features

Technical Details

Applications

Impact

Citations

Vibe Coding Specificity Foundation Models

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Recent citations

Generative Drug Design in a Loop with dtSFM

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Top citations

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Generative Drug Design in a Loop with dtSFM

Citations

Fields of citing research

Openness

Tags

Resources

drug-SFM

#Key Features

#Technical Details

#Applications

#Impact

Citations

Vibe Coding Specificity Foundation Models

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

drug-SFM

#Key Features

#Technical Details

#Applications

#Impact

Citations

Vibe Coding Specificity Foundation Models

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact