Overview

trRosettaRNA is an automated deep learning pipeline for predicting RNA three-dimensional structures from sequence, developed by the Yang Lab at Shandong University. Published in Nature Communications in November 2023, it addresses a longstanding challenge in structural biology: unlike proteins, RNA 3D structure prediction has historically lagged behind sequence-level annotation, leaving the vast majority of functionally important RNA families without structural models.

The method works through a two-stage process. First, a transformer network called RNAformer predicts inter-nucleotide geometries — distances, orientations, torsion angles, and contact probabilities — directly from multiple sequence alignments (MSAs) and secondary structure predictions. These predicted geometries are then used as spatial restraints to guide Rosetta-based energy minimization, producing physically realistic 3D atomic coordinates. This hybrid strategy combines the pattern-recognition power of deep learning with the stereochemical rigor of physics-based modeling.

At the 15th Critical Assessment of Structure Prediction (CASP15), trRosettaRNA achieved performance competitive with top human expert groups and outperformed other fully automated deep learning methods by RMSD on the blind test targets. The authors also applied the method at scale, generating confident structural models for 467 Rfam families that previously lacked any experimentally determined structure.

Key Features

RNAformer transformer architecture: A 48-block transformer network that jointly updates MSA and pairwise residue representations through row- and column-wise attention, closely analogous to AlphaFold 2's Evoformer but tailored to RNA nucleotide chemistry.
Multi-geometry output: Simultaneously predicts 2D geometry maps (inter-nucleotide distances, orientations, and contacts) and 1D backbone torsion angles, providing a comprehensive geometric description of the target RNA fold.
Physics-based refinement: Rosetta energy minimization with predicted geometry restraints produces all-atom 3D models that respect known RNA stereochemistry, generating 20 candidate structures per input and selecting by Rosetta energy score.
Self-distillation training: Beyond the 3,633 RNA chains in the PDB, training is extended through iterative self-distillation on the bpRNA database, with uncertain predictions filtered by Kullback-Leibler divergence to maintain data quality.
Fully automated pipeline: End-to-end workflow from sequence input to ranked 3D models requires no manual intervention, lowering the barrier to large-scale structural genomics studies.

Technical Details

RNAformer processes two input representations derived from the MSA: a sequence-level MSA representation encoding nucleotide identity across aligned homologs, and a pairwise representation initialized from direct couplings in the MSA and predicted secondary structure probability matrices. Each of the 48 transformer blocks applies four sequential update operations — MSA row attention, MSA column attention, outer-product MSA-to-pair updates, and triangle-update pair-to-pair refinement using a Res2Net architecture — mirroring the Evoformer design philosophy. The network is cycled four times through these blocks during inference for iterative refinement. Final 2D geometry predictions (distance, dihedral orientation, and contact distributions) are read out from the pair representation; 1D backbone torsion angles are derived from row-wise weighted summation of the MSA representation.

Training used 3,633 non-redundant RNA chains from the PDB as the primary dataset, supplemented by self-distillation on predicted structures from bpRNA. The 3D structure generation stage uses L-BFGS optimization within Rosetta, incorporating RNAformer-derived distance and orientation restraints alongside physics-based RNA potentials. Twenty candidate structures are generated per query, and the lowest-energy model by Rosetta score is reported as the primary prediction. The web server at yanglab.qd.sdu.edu.cn accepts raw RNA sequences and returns ranked 3D models, making the tool broadly accessible without local installation.

Applications

trRosettaRNA is suited for researchers who need 3D structural models of RNA sequences for which no experimental structure exists. Structural biologists can use predictions to guide cryo-EM data interpretation or to design crystallization constructs. RNA biologists working on regulatory non-coding RNAs, riboswitches, or CRISPR-associated RNAs can use the models to generate mechanistic hypotheses about structure-function relationships. The method is also applicable to drug discovery targeting RNA, where 3D structural context is increasingly sought for fragment screening and small-molecule design. At scale, the Rfam-wide application demonstrates utility for structural genomics efforts aimed at annotating entire RNA families.

Impact

trRosettaRNA represents a meaningful advance in automated RNA 3D structure prediction, arriving alongside a broader wave of deep learning methods — including ARES, DeepFoldRNA, and RhoFold — that collectively elevated the field during and after CASP15. Its competitive showing in the CASP15 blind test against human expert predictions validated the two-stage hybrid approach as a viable alternative to purely physics-based or purely data-driven methods. The large-scale application to 467 uncharacterized Rfam families is a concrete contribution to structural genomics, providing the community with a first structural reference for many biologically important RNA classes. A key limitation is that the approach depends on the availability of informative MSAs; for highly divergent or orphan RNAs with few known homologs, prediction accuracy degrades. The method also produces static single-conformation models and does not capture the conformational dynamics that are often central to RNA function.

Citation

trRosettaRNA: automated prediction of RNA 3D structure with transformer network

Wang, W., Feng, C., Han, R., Wang, Z., Ye, L., Du, Z., Wei, H., Zhang, F., Peng, Z., & Yang, J. (2023). trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nature Communications, 14(1), 7266.

DOI: 10.1038/s41467-023-42528-4

Overview

Key Features

RNAformer transformer architecture: A 48-block transformer network that jointly updates MSA and pairwise residue representations through row- and column-wise attention, closely analogous to AlphaFold 2's Evoformer but tailored to RNA nucleotide chemistry.

Multi-geometry output: Simultaneously predicts 2D geometry maps (inter-nucleotide distances, orientations, and contacts) and 1D backbone torsion angles, providing a comprehensive geometric description of the target RNA fold.

Physics-based refinement: Rosetta energy minimization with predicted geometry restraints produces all-atom 3D models that respect known RNA stereochemistry, generating 20 candidate structures per input and selecting by Rosetta energy score.

Self-distillation training: Beyond the 3,633 RNA chains in the PDB, training is extended through iterative self-distillation on the bpRNA database, with uncertain predictions filtered by Kullback-Leibler divergence to maintain data quality.

Fully automated pipeline: End-to-end workflow from sequence input to ranked 3D models requires no manual intervention, lowering the barrier to large-scale structural genomics studies.

Technical Details

Applications

Impact

Citation

trRosettaRNA: automated prediction of RNA 3D structure with transformer network

DOI: 10.1038/s41467-023-42528-4

trRosettaRNA

Overview

Key Features

Technical Details

Applications

Impact

Citation

trRosettaRNA: automated prediction of RNA 3D structure with transformer network

Metrics

GitHub

Citations

Tags

Resources

trRosettaRNA

Overview

Key Features

Technical Details

Applications

Impact

Citation

trRosettaRNA: automated prediction of RNA 3D structure with transformer network

Metrics

GitHub

Citations

Tags

Resources