bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
RNA foundation models
RNA

mir-SFM

ETH Zurich

Specificity Foundation Model that predicts microRNA-mRNA target specificity from sequence using a physics-derived dual-encoder with symmetric contrastive learning.

Released: June 2026

mir-SFM is a Specificity Foundation Model (SFM) for predicting microRNA (miRNA)–mRNA target specificity directly from sequence. Identifying which mRNAs a microRNA silences is central to understanding post-transcriptional gene regulation, yet seed-match heuristics produce many false positives and experimental target maps remain sparse. mir-SFM frames miRNA–mRNA matching as a cross-modal retrieval problem, learning to align cognate miRNA–target pairs in a shared representation space so that likely regulatory interactions can be scored from sequence alone.

Developed by the Reddy lab at ETH Zurich and posted as a bioRxiv preprint in June 2026, mir-SFM is one of six models in the SFM family, all built on a single, physics-derived dual-encoder architecture. It is the sequel to CALM-1, the antibody–antigen specificity model from the same group, generalizing that contrastive molecular-recognition recipe from immune binding to RNA-mediated gene silencing.

The model encodes miRNA and mRNA sequences with separate encoders and aligns them using a symmetric contrastive objective, pulling true targeting pairs together and pushing non-targets apart. Among the six SFMs, mir-SFM achieves the family's strongest reported benchmark, reaching a top-1 retrieval rate (R@1) of up to 98.0% in zero-shot cross-modal retrieval.

#Key Features

  • Sequence-to-specificity prediction: Predicts miRNA–mRNA targeting from sequence alone, going beyond seed-match rules to capture broader determinants of recognition.
  • Physics-derived dual-encoder: Encodes microRNA and mRNA separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.
  • Symmetric contrastive learning: Aligns cognate miRNA–mRNA pairs in a shared embedding space, enabling retrieval in either direction (miRNA-to-target or target-to-miRNA).
  • Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.
  • State-of-the-family retrieval: Achieves a top-1 retrieval rate of up to 98.0%, the strongest zero-shot benchmark reported across the SFM family.

#Technical Details

mir-SFM uses the shared SFM architecture: a physics-derived dual-encoder trained with a symmetric contrastive objective and a learned Boltzmann temperature that calibrates similarity scores. The two encoders embed microRNA and mRNA sequences independently, and the contrastive loss aligns cognate pairs while separating mismatches. The model is pretrained on public miRNA–mRNA specificity data and evaluated by zero-shot cross-modal retrieval on held-out pairs, where it reports a top-1 retrieval rate (R@1) of up to 98.0%—the highest of the six SFMs and a measure of how reliably the model recovers true regulatory targets.

#Applications

mir-SFM is aimed at RNA biology and gene-regulation research, where predicting microRNA targets from sequence can accelerate the mapping of regulatory networks and the interpretation of miRNA dysregulation in disease. By scoring and retrieving likely targets for a microRNA—or candidate regulators for an mRNA—it can help prioritize interactions for experimental validation, refine target predictions beyond seed matching, and complement CLIP-based target maps where coverage is incomplete.

#Impact

mir-SFM extends the contrastive specificity-prediction paradigm established by CALM-1 from antibody–antigen recognition to microRNA–mRNA recognition, and its 98.0% top-1 retrieval rate is the strongest result in the SFM family—evidence that a single physics-derived dual-encoder recipe can excel on RNA-mediated specificity. As one of six SFMs released together, it strengthens the case that cross-modal contrastive learning is a general tool for biological specificity prediction. Its main current limitations are those of a recent preprint: results await peer review and independent benchmarking, and at the time of release no public code or weights repository was available, so reproduction depends on forthcoming artifact releases.

Citation

Vibe Coding Specificity Foundation Models

Reddy, S. T. (2026) Vibe Coding Specificity Foundation Models. bioRxiv.

DOI: 10.64898/2026.06.04.730134

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References52

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
25Closed
Usability — can I run it?18
Reproducibility — can I retrain it?18
Model Openness Framework
Unclassified
Missing required components

Tags

contrastive_learningcross_modal_retrievaldual_encoderfoundation_modelmicrornamrnatarget_predictiontransformerzero_shot

Resources

Research Paper