bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & Gene

crisprSFM

ETH Zurich

Specificity Foundation Model that predicts CRISPR gRNA off-target DNA specificity from sequence using a physics-derived dual-encoder with symmetric contrastive learning.

Released: June 2026

crisprSFM is a Specificity Foundation Model (SFM) for predicting CRISPR guide RNA (gRNA) off-target DNA specificity directly from sequence. Anticipating where a guide will bind besides its intended target is critical to the safety of genome editing, and existing tools largely rely on mismatch-counting heuristics or task-specific supervised models. crisprSFM instead frames gRNA–DNA recognition as a cross-modal retrieval problem, learning to align cognate guide–target pairs in a shared representation space so that on- and off-target binding can be scored from sequence alone.

Developed by the Reddy lab at ETH Zurich and posted as a bioRxiv preprint in June 2026, crisprSFM is one of six models in the SFM family, all built on a single, physics-derived dual-encoder architecture. It is the sequel to CALM-1, the antibody–antigen specificity model from the same group, generalizing that contrastive molecular-recognition recipe from immune binding to nuclease targeting.

The model encodes gRNA and genomic DNA sequences with separate encoders and aligns them using a symmetric contrastive objective, pulling true binding pairs together and pushing non-targets apart. This formulation lets crisprSFM transfer knowledge across guides and loci into zero-shot predictions for held-out gRNAs and candidate off-target sites.

#Key Features

  • Sequence-to-specificity prediction: Predicts gRNA on- and off-target binding from guide and DNA sequence alone, without relying solely on mismatch-counting rules.
  • Physics-derived dual-encoder: Encodes guide RNA and genomic DNA separately, with an architecture motivated by the physics of molecular recognition rather than a generic backbone.
  • Symmetric contrastive learning: Aligns cognate gRNA–DNA pairs in a shared embedding space, enabling retrieval in either direction (guide-to-site or site-to-guide).
  • Learned Boltzmann temperature: A learned temperature parameter calibrates the contrastive similarity scores in a thermodynamically motivated way.
  • Zero-shot cross-modal retrieval: Generalizes to unseen guides and genomic loci without task-specific fine-tuning.

#Technical Details

crisprSFM uses the shared SFM architecture: a physics-derived dual-encoder trained with a symmetric contrastive objective and a learned Boltzmann temperature that calibrates similarity scores. The two encoders embed guide RNA and genomic DNA independently, and the contrastive loss aligns cognate pairs while separating mismatches. The model is pretrained on public CRISPR specificity data and evaluated by zero-shot cross-modal retrieval on held-out pairs, where it reports strong top-k retrieval performance—mirroring the benchmarks used across the SFM family for measuring how reliably a model recovers true binding partners.

#Applications

crisprSFM is aimed at genome-editing design and safety assessment, where predicting off-target activity from sequence can guide the selection of high-fidelity guides. By scoring and retrieving likely off-target sites for a candidate gRNA, it can help prioritize guides for experimental validation, flag risky edits before deployment, and complement empirical off-target assays such as GUIDE-seq in therapeutic and research editing workflows.

#Impact

crisprSFM extends the contrastive specificity-prediction paradigm established by CALM-1 from antibody–antigen recognition to CRISPR guide–DNA recognition, demonstrating that a single physics-derived dual-encoder recipe transfers across molecular domains. As one of six SFMs released together, it contributes evidence that cross-modal contrastive learning is a general tool for biological specificity prediction. Its main current limitations are those of a recent preprint: results await peer review and independent benchmarking, and at the time of release no public code or weights repository was available, so reproduction depends on forthcoming artifact releases.

Citation

Vibe Coding Specificity Foundation Models

Reddy, S. T. (2026) Vibe Coding Specificity Foundation Models. bioRxiv.

DOI: 10.64898/2026.06.04.730134

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References52

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
19Closed
Usability — can I run it?14
Reproducibility — can I retrain it?12
Model Openness Framework
Unclassified
Missing required components

Tags

contrastive_learningcrisprcross_modal_retrievaldnadual_encoderfoundation_modeloff_target_predictiontransformerzero_shot

Resources

Research Paper