ZeroFold

Transformer that predicts protein-RNA binding affinity from Boltz-2 pre-structural embeddings via cross-modal attention, with no 3D structure step.

Released: March 2026

ZeroFold addresses a long-standing challenge in structural biology: predicting how tightly a protein binds an RNA molecule. Accurate protein-RNA affinity prediction matters for understanding post-transcriptional gene regulation and for designing RNA-targeting therapeutics, yet it has remained largely unsolved. A central obstacle is the conformational flexibility of RNA, which—unlike most proteins—exists as a dynamic ensemble rather than a single dominant fold. Committing to one predicted structure discards information that is relevant to binding, limiting the usefulness of structure-first prediction pipelines for this problem.

The model's core idea is to bypass explicit structure prediction altogether. Rather than decoding a 3D structure and scoring it, ZeroFold extracts pre-structural embeddings—the intermediate representations produced by a biomolecular foundation model just before its structure-decoding step. The authors argue these embeddings implicitly encode conformational-ensemble information, making them a natural representation for flexible biomolecules such as RNA. ZeroFold builds a trained transformer on top of these embeddings to map a protein-RNA pair directly to a binding-affinity prediction from sequence alone.

ZeroFold was developed by Josef Hanke, Sebastian Pujalte Ojeda, Shengyu Zhang, Werngard Czechtizky, Leonardo De Maria, and Michele Vendruscolo at the Yusuf Hamied Department of Chemistry, University of Cambridge, in collaboration with AstraZeneca's medicinal chemistry group, and was posted to arXiv in March 2026.

Key Features

Pre-structural embeddings: Uses Boltz-2's intermediate representations, captured before structure decoding, which the authors argue retain conformational-ensemble signal that a single predicted structure would discard.
Cross-modal attention: A transformer combines the protein and RNA embeddings through a cross-modal attention mechanism, learning interaction-relevant features across the two molecule types.
Structure-free inference: Predicts affinity directly from sequence, enabling estimates for protein-RNA pairs that lack any experimental structural data.
Curated benchmark (PRADB): Ships with a purpose-built dataset of 2,621 unique protein-RNA pairs with experimentally measured affinities, drawn from four complementary databases.
Robustness to novelty: Under evaluation conditions that control for training-set overlap, ZeroFold's advantage over competing methods widens as test sequences become less similar to competitors' training data.

Technical Details

ZeroFold is a transformer-based predictor that consumes pre-structural embeddings from Boltz-2 for both the protein and the RNA, fuses them with cross-modal attention, and runs as a fixed checkpoint at inference on new sequences. Training and evaluation use PRADB, a curated set of 2,621 unique protein-RNA pairs with experimental affinities aggregated from four databases. On a held-out test set built with a 40% sequence-identity threshold to limit train-test leakage, ZeroFold reaches a Spearman correlation of 0.65, which the authors describe as approaching the ceiling imposed by experimental measurement noise. Under progressively stricter evaluation that accounts for overlap with competitor training sets, ZeroFold compares favourably with both leading structure-based and leading sequence-based affinity predictors. The preprint does not report a parameter count or full architectural hyperparameters.

Applications

ZeroFold targets researchers and drug-discovery teams working on RNA biology and RNA-targeting modalities. Because it estimates binding affinity from sequence without requiring a predicted or experimental complex structure, it is well suited to prioritising candidate protein-RNA pairs at scale, screening RNA-binding proteins, and exploring interactions for which no structural data exist—settings where structure-first pipelines stall. The pre-structural-embedding strategy is also of methodological interest to teams building affinity or interaction models for other flexible biomolecules.

Impact

ZeroFold contributes to a growing line of work that repurposes the internal representations of biomolecular foundation models such as Boltz-2, rather than only their final structural outputs, and argues that these intermediate features are particularly valuable for conformationally flexible systems like RNA. By framing protein-RNA affinity prediction as a problem solvable from pre-structural embeddings, it offers a route to a class of predictions that has been difficult for structure-based methods. As of this writing the work is a preprint, and no public code or trained weights have been released; the AstraZeneca collaboration may also constrain weight distribution. Independent benchmarking and a public implementation would help establish how broadly the approach generalises.

Citation

ZeroFold: Protein-RNA Binding Affinity Predictions from Pre-Structural Embeddings

Preprint

Hanke, J., et al. (2026) ZeroFold: Protein-RNA Binding Affinity Predictions from Pre-Structural Embeddings.

DOI: 10.48550/arXiv.2603.23583

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

23Closed

Usability — can I run it?15

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Pre-structural embeddings: Uses Boltz-2's intermediate representations, captured before structure decoding, which the authors argue retain conformational-ensemble signal that a single predicted structure would discard.

Cross-modal attention: A transformer combines the protein and RNA embeddings through a cross-modal attention mechanism, learning interaction-relevant features across the two molecule types.

Structure-free inference: Predicts affinity directly from sequence, enabling estimates for protein-RNA pairs that lack any experimental structural data.

Curated benchmark (PRADB): Ships with a purpose-built dataset of 2,621 unique protein-RNA pairs with experimentally measured affinities, drawn from four complementary databases.

Robustness to novelty: Under evaluation conditions that control for training-set overlap, ZeroFold's advantage over competing methods widens as test sequences become less similar to competitors' training data.

Technical Details

Applications

Impact

ZeroFold

Key Features

Technical Details

Applications

Impact

Citation

ZeroFold: Protein-RNA Binding Affinity Predictions from Pre-Structural Embeddings

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

ZeroFold

Key Features

Technical Details

Applications

Impact

Citation

ZeroFold: Protein-RNA Binding Affinity Predictions from Pre-Structural Embeddings

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

ZeroFold

#Key Features

#Technical Details

#Applications

#Impact

Citation

ZeroFold: Protein-RNA Binding Affinity Predictions from Pre-Structural Embeddings

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

ZeroFold

#Key Features

#Technical Details

#Applications

#Impact

Citation

ZeroFold: Protein-RNA Binding Affinity Predictions from Pre-Structural Embeddings

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact