bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
RNA foundation models
RNA

RNAformer

University of Freiburg

Axial-attention transformer for RNA secondary structure prediction from single sequences, without MSAs. Achieves state-of-the-art accuracy via homology-aware training.

Released: February 2024

RNA secondary structure prediction is a foundational problem in molecular biology, underpinning our understanding of gene regulation, ribozyme catalysis, and RNA-based therapeutics. Most high-performing deep learning approaches have relied on multiple sequence alignments (MSAs) — collections of evolutionarily related sequences — to infer structural constraints from co-evolutionary signals. This dependency limits their applicability to well-characterized RNA families where sufficient homologs exist.

RNAformer, developed by researchers at the University of Freiburg and released as a preprint in February 2024, takes a different approach: it predicts RNA secondary structure directly from a single input sequence using a lean axial-attention architecture. The model is trained in two stages, beginning on synthetic data generated from thermodynamic folding principles and then refined on curated experimental structures with strict homology controls. This pipeline avoids the data leakage issues that have inflated reported performance in earlier deep learning models, producing reliable generalization estimates on held-out RNA families.

The result is a model that achieves state-of-the-art accuracy on standard benchmarks — including sequences with no available homologs — while remaining computationally tractable and straightforward to apply without MSA construction pipelines.

#Key Features

  • Single-sequence input: Operates directly on raw RNA sequences without requiring MSAs or external homology data, broadening applicability to novel and poorly characterized RNAs.
  • Axial attention over a 2D pairwise representation: Encodes nucleotide relationships in a two-dimensional pairwise matrix and applies attention along rows and columns separately, reducing complexity from O(n^4) to O(n^3) while capturing long-range base-pairing interactions.
  • Recycling inference: Iteratively feeds predictions back through the network to progressively refine the contact map, a mechanism analogous to the recycling used in AlphaFold 2.
  • Two-stage homology-aware training: Pre-trains on synthetic thermodynamic data to learn biophysical priors, then fine-tunes on experimental databases (bpRNA) with rigorous train/test homology separation to prevent data leakage.
  • Multiple pre-trained checkpoints: Ships with distinct model variants (biophysical, bpRNA, intra-family, inter-family) suited to different prediction regimes, from sequence-only contexts to curated family-level benchmarks.

#Technical Details

RNAformer is a transformer-based model that represents RNA sequence information in a two-dimensional latent space of pairwise nucleotide embeddings. Axial attention — applying attention along rows and columns of the pairwise matrix independently — provides an efficient mechanism for modeling base-pairing preferences across the full sequence without the quartic cost of naive 2D attention. A recycling mechanism iterates the representation through the network for multiple passes before producing a final contact probability matrix, which is then decoded into a secondary structure prediction.

Training proceeds in two stages. In the first stage, the model is pre-trained on large-scale synthetic RNA structures generated by thermodynamic methods, instilling knowledge of fundamental base-pairing rules and nearest-neighbor energy parameters. In the second stage, the model is fine-tuned on experimental structures from bpRNA with careful sequence-identity cutoffs between training and test splits — a critical methodological control that several prior deep learning methods omitted, leading to inflated benchmark claims. The implementation supports FlashAttention on modern GPU architectures (Ampere, Ada, Hopper), enabling efficient inference. Pre-trained checkpoints are distributed via the project's GitHub repository.

#Applications

RNAformer is well suited to any research context requiring fast, reference-free RNA secondary structure prediction. Structural biologists and RNA biochemists can use it to annotate newly sequenced non-coding RNAs, riboswitches, or viral RNA genomes where MSA construction is impractical or impossible. Medicinal chemists working on RNA-targeted therapeutics can use predicted structures to identify potential drug-binding pockets in undercharacterized RNAs. Synthetic biologists designing functional RNA devices — aptamers, ribozymes, or RNA scaffolds — can leverage the model to evaluate whether designed sequences adopt the intended fold. The model is also well positioned for high-throughput screening applications, such as processing large transcriptomic datasets or evaluating mutational effects on RNA structure at scale.

#Impact

RNAformer represents a methodologically rigorous entry in the competitive field of RNA structure prediction, with its homology-aware training pipeline setting a cleaner standard for benchmark evaluation than several predecessor methods. By demonstrating that axial attention over pairwise representations can achieve state-of-the-art accuracy without MSA inputs, it expands the practical scope of deep learning-based RNA structure prediction to sequences lacking characterized homologs. As an open-source model with freely available checkpoints, it lowers the barrier for both computational and experimental researchers to incorporate structure prediction into their workflows. A key limitation is that RNAformer predicts secondary structure (base-pair contacts) but does not model tertiary structure or pseudoknots, leaving three-dimensional RNA folding and more complex topologies to complementary approaches such as RhoFold or trRosettaRNA.

Citation

RNAformer: A Simple yet Effective Model for Homology-Aware RNA Secondary Structure Prediction

Preprint

Franke, J.K.H., Runge, F., Köksal, R., Matus, D., Backofen, R., & Hutter, F. (2024). RNAformer: A Simple yet Effective Model for Homology-Aware RNA Secondary Structure Prediction. bioRxiv, 2024.02.12.579881.

DOI: 10.1101/2024.02.12.579881

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations5
Influential1
References0

GitHub

Stars43
Forks10
Open Issues3
Contributors5
Last Push8mo ago
LanguagePython
LicenseApache-2.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
53Partial
Usability — can I run it?61
Reproducibility — can I retrain it?54
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

structure_predictiontransformer

Resources

GitHub RepositoryResearch PaperLink