mhcSFM

Peptide-MHC binding specificity model that frames presentation as cross-modal retrieval, aligning peptide and MHC encoders by contrastive learning.

Released: June 2026

mhcSFM is a Specificity Foundation Model (SFM) for predicting peptide–MHC binding specificity directly from sequence. Which peptides are presented by which major histocompatibility complex (MHC) alleles governs T-cell recognition and is central to vaccine design, immunotherapy, and neoantigen discovery. Conventional predictors are trained allele-by-allele on mass-spectrometry and binding-affinity data; mhcSFM instead frames peptide–MHC matching as a cross-modal retrieval problem, learning to align cognate peptide–MHC pairs in a shared representation space so that likely presentation events can be scored from sequence alone.

Developed by the Reddy lab at ETH Zurich and posted as a bioRxiv preprint in June 2026, mhcSFM is one of six models in the SFM family, all built on a single, physics-derived dual-encoder architecture. It is the sequel to CALM-1, the antibody–antigen specificity model from the same group, generalizing that contrastive molecular-recognition recipe from antibody binding to MHC presentation.

The model encodes peptide and MHC sequences with separate encoders and aligns them using a symmetric contrastive objective, pulling true presentation pairs together and pushing non-binders apart. This formulation lets mhcSFM transfer knowledge across alleles into zero-shot predictions for held-out peptides and MHC variants.

Key Features

Sequence-to-specificity prediction: Predicts peptide–MHC binding from peptide and MHC sequence alone, without requiring allele-specific structural models.
Physics-derived dual-encoder: Encodes peptide and MHC separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.
Symmetric contrastive learning: Aligns cognate peptide–MHC pairs in a shared embedding space, enabling retrieval in either direction (peptide-to-MHC or MHC-to-peptide).
Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.
Zero-shot cross-modal retrieval: Generalizes to unseen peptides and MHC alleles without task-specific fine-tuning.

Technical Details

mhcSFM uses the shared SFM architecture: a physics-derived dual-encoder trained with a symmetric contrastive objective and a learned Boltzmann temperature that calibrates similarity scores. The two encoders embed peptide and MHC sequences independently, and the contrastive loss aligns cognate pairs while separating mismatches. The model is pretrained on public peptide–MHC specificity data and evaluated by zero-shot cross-modal retrieval on held-out pairs, where it reports strong top-k retrieval performance—mirroring the benchmarks used across the SFM family for measuring how reliably a model recovers true presentation partners.

Applications

mhcSFM is aimed at computational immunology, where predicting which peptides an MHC allele presents from sequence can accelerate epitope prediction, neoantigen prioritization, and vaccine design. By scoring and retrieving likely peptide–MHC pairs, it can help triage candidate epitopes across patient-specific HLA backgrounds, support T-cell target discovery, and complement mass-spectrometry immunopeptidomics where coverage is incomplete.

Impact

mhcSFM extends the contrastive specificity-prediction paradigm established by CALM-1 from antibody–antigen recognition to peptide–MHC recognition, demonstrating that a single physics-derived dual-encoder recipe transfers across molecular domains. As one of six SFMs released together, it contributes evidence that cross-modal contrastive learning is a general tool for biological specificity prediction. Its main current limitations are those of a recent preprint: results await peer review and independent benchmarking, and at the time of release no public code or weights repository was available, so reproduction depends on forthcoming artifact releases.

Citation

Vibe Coding Specificity Foundation Models

Reddy, S. T. (2026) Vibe Coding Specificity Foundation Models. bioRxiv.

DOI: 10.64898/2026.06.04.730134

Recent citations

Papers that recently cited this model.

Generative Drug Design in a Loop with dtSFM
Sai T. Reddy
bioRxiv · Jun 2026
0
A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design
Sai T. Reddy
bioRxiv · Jun 2026
0

Top citations

The most-cited papers that cite this model.

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design
Sai T. Reddy
bioRxiv · Jun 2026
0
Generative Drug Design in a Loop with dtSFM
Sai T. Reddy
bioRxiv · Jun 2026
0

Citations

Total Citations2

Influential0

References52

Fields of citing research

Biology100%
Computer Science100%
Medicine100%
Chemistry50%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

23Closed

Usability — can I run it?15

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Sequence-to-specificity prediction: Predicts peptide–MHC binding from peptide and MHC sequence alone, without requiring allele-specific structural models.

Physics-derived dual-encoder: Encodes peptide and MHC separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.

Symmetric contrastive learning: Aligns cognate peptide–MHC pairs in a shared embedding space, enabling retrieval in either direction (peptide-to-MHC or MHC-to-peptide).

Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.

Zero-shot cross-modal retrieval: Generalizes to unseen peptides and MHC alleles without task-specific fine-tuning.

Technical Details

Applications

Impact

mhcSFM

Key Features

Technical Details

Applications

Impact

Citation

Vibe Coding Specificity Foundation Models

Recent citations

Generative Drug Design in a Loop with dtSFM

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Top citations

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Generative Drug Design in a Loop with dtSFM

Citations

Fields of citing research

Openness

Tags

Resources

mhcSFM

Key Features

Technical Details

Applications

Impact

Citation

Vibe Coding Specificity Foundation Models

Recent citations

Generative Drug Design in a Loop with dtSFM

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Top citations

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Generative Drug Design in a Loop with dtSFM

Citations

Fields of citing research

Openness

Tags

Resources

mhcSFM

#Key Features

#Technical Details

#Applications

#Impact

Citation

Vibe Coding Specificity Foundation Models

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

mhcSFM

#Key Features

#Technical Details

#Applications

#Impact

Citation

Vibe Coding Specificity Foundation Models

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact