mir-SFM

Foundation model that predicts microRNA-mRNA target specificity from sequence, using a dual-encoder trained with a symmetric contrastive objective.

Released: June 2026

mir-SFM is a Specificity Foundation Model (SFM) for predicting microRNA (miRNA)–mRNA target specificity directly from sequence. Identifying which mRNAs a microRNA silences is central to understanding post-transcriptional gene regulation, yet seed-match heuristics produce many false positives and experimental target maps remain sparse. mir-SFM frames miRNA–mRNA matching as a cross-modal retrieval problem, learning to align cognate miRNA–target pairs in a shared representation space so that likely regulatory interactions can be scored from sequence alone.

Developed by the Reddy lab at ETH Zurich and posted as a bioRxiv preprint in June 2026, mir-SFM is one of six models in the SFM family, all built on a single, physics-derived dual-encoder architecture. It is the sequel to CALM-1, the antibody–antigen specificity model from the same group, generalizing that contrastive molecular-recognition recipe from immune binding to RNA-mediated gene silencing.

The model encodes miRNA and mRNA sequences with separate encoders and aligns them using a symmetric contrastive objective, pulling true targeting pairs together and pushing non-targets apart. Among the six SFMs, mir-SFM achieves the family's strongest reported benchmark, reaching a top-1 retrieval rate (R@1) of up to 98.0% in zero-shot cross-modal retrieval.

Key Features

Sequence-to-specificity prediction: Predicts miRNA–mRNA targeting from sequence alone, going beyond seed-match rules to capture broader determinants of recognition.
Physics-derived dual-encoder: Encodes microRNA and mRNA separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.
Symmetric contrastive learning: Aligns cognate miRNA–mRNA pairs in a shared embedding space, enabling retrieval in either direction (miRNA-to-target or target-to-miRNA).
Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.
State-of-the-family retrieval: Achieves a top-1 retrieval rate of up to 98.0%, the strongest zero-shot benchmark reported across the SFM family.

Technical Details

mir-SFM uses the shared SFM architecture: a physics-derived dual-encoder trained with a symmetric contrastive objective and a learned Boltzmann temperature that calibrates similarity scores. The two encoders embed microRNA and mRNA sequences independently, and the contrastive loss aligns cognate pairs while separating mismatches. The model is pretrained on public miRNA–mRNA specificity data and evaluated by zero-shot cross-modal retrieval on held-out pairs, where it reports a top-1 retrieval rate (R@1) of up to 98.0%—the highest of the six SFMs and a measure of how reliably the model recovers true regulatory targets.

Applications

mir-SFM is aimed at RNA biology and gene-regulation research, where predicting microRNA targets from sequence can accelerate the mapping of regulatory networks and the interpretation of miRNA dysregulation in disease. By scoring and retrieving likely targets for a microRNA—or candidate regulators for an mRNA—it can help prioritize interactions for experimental validation, refine target predictions beyond seed matching, and complement CLIP-based target maps where coverage is incomplete.

Impact

mir-SFM extends the contrastive specificity-prediction paradigm established by CALM-1 from antibody–antigen recognition to microRNA–mRNA recognition, and its 98.0% top-1 retrieval rate is the strongest result in the SFM family—evidence that a single physics-derived dual-encoder recipe can excel on RNA-mediated specificity. As one of six SFMs released together, it strengthens the case that cross-modal contrastive learning is a general tool for biological specificity prediction. Its main current limitations are those of a recent preprint: results await peer review and independent benchmarking, and at the time of release no public code or weights repository was available, so reproduction depends on forthcoming artifact releases.

Citation

Vibe Coding Specificity Foundation Models

Reddy, S. T. (2026) Vibe Coding Specificity Foundation Models. bioRxiv.

DOI: 10.64898/2026.06.04.730134

Recent citations

Papers that recently cited this model.

Generative Drug Design in a Loop with dtSFM
Sai T. Reddy
bioRxiv · Jun 2026
0
A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design
Sai T. Reddy
bioRxiv · Jun 2026
0

Top citations

The most-cited papers that cite this model.

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design
Sai T. Reddy
bioRxiv · Jun 2026
0
Generative Drug Design in a Loop with dtSFM
Sai T. Reddy
bioRxiv · Jun 2026
0

Citations

Total Citations2

Influential0

References52

Fields of citing research

Biology100%
Computer Science100%
Medicine100%
Chemistry50%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

25Closed

Usability — can I run it?18

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Sequence-to-specificity prediction: Predicts miRNA–mRNA targeting from sequence alone, going beyond seed-match rules to capture broader determinants of recognition.

Physics-derived dual-encoder: Encodes microRNA and mRNA separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.

Symmetric contrastive learning: Aligns cognate miRNA–mRNA pairs in a shared embedding space, enabling retrieval in either direction (miRNA-to-target or target-to-miRNA).

Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.

State-of-the-family retrieval: Achieves a top-1 retrieval rate of up to 98.0%, the strongest zero-shot benchmark reported across the SFM family.

Technical Details

Applications

Impact

mir-SFM

Key Features

Technical Details

Applications

Impact

Citation

Vibe Coding Specificity Foundation Models

Recent citations

Generative Drug Design in a Loop with dtSFM

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Top citations

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Generative Drug Design in a Loop with dtSFM

Citations

Fields of citing research

Openness

Tags

Resources

mir-SFM

Key Features

Technical Details

Applications

Impact

Citation

Vibe Coding Specificity Foundation Models

Recent citations

Generative Drug Design in a Loop with dtSFM

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Top citations

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

Generative Drug Design in a Loop with dtSFM

Citations

Fields of citing research

Openness

Tags

Resources

mir-SFM

#Key Features

#Technical Details

#Applications

#Impact

Citation

Vibe Coding Specificity Foundation Models

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

mir-SFM

#Key Features

#Technical Details

#Applications

#Impact

Citation

Vibe Coding Specificity Foundation Models

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact