trRosettaRNA is an automated deep learning pipeline for predicting RNA three-dimensional structures from sequence, developed by the Yang Lab at Shandong University. Published in Nature Communications in November 2023, it addresses a longstanding challenge in structural biology: unlike proteins, RNA 3D structure prediction has historically lagged behind sequence-level annotation, leaving the vast majority of functionally important RNA families without structural models.
The method works through a two-stage process. First, a transformer network called RNAformer predicts inter-nucleotide geometries — distances, orientations, torsion angles, and contact probabilities — directly from multiple sequence alignments (MSAs) and secondary structure predictions. These predicted geometries are then used as spatial restraints to guide Rosetta-based energy minimization, producing physically realistic 3D atomic coordinates. This hybrid strategy combines the pattern-recognition power of deep learning with the stereochemical rigor of physics-based modeling.
At the 15th Critical Assessment of Structure Prediction (CASP15), trRosettaRNA achieved performance competitive with top human expert groups and outperformed other fully automated deep learning methods by RMSD on the blind test targets. The authors also applied the method at scale, generating confident structural models for 467 Rfam families that previously lacked any experimentally determined structure.
RNAformer processes two input representations derived from the MSA: a sequence-level MSA representation encoding nucleotide identity across aligned homologs, and a pairwise representation initialized from direct couplings in the MSA and predicted secondary structure probability matrices. Each of the 48 transformer blocks applies four sequential update operations — MSA row attention, MSA column attention, outer-product MSA-to-pair updates, and triangle-update pair-to-pair refinement using a Res2Net architecture — mirroring the Evoformer design philosophy. The network is cycled four times through these blocks during inference for iterative refinement. Final 2D geometry predictions (distance, dihedral orientation, and contact distributions) are read out from the pair representation; 1D backbone torsion angles are derived from row-wise weighted summation of the MSA representation.
Training used 3,633 non-redundant RNA chains from the PDB as the primary dataset, supplemented by self-distillation on predicted structures from bpRNA. The 3D structure generation stage uses L-BFGS optimization within Rosetta, incorporating RNAformer-derived distance and orientation restraints alongside physics-based RNA potentials. Twenty candidate structures are generated per query, and the lowest-energy model by Rosetta score is reported as the primary prediction. The web server at yanglab.qd.sdu.edu.cn accepts raw RNA sequences and returns ranked 3D models, making the tool broadly accessible without local installation.
trRosettaRNA is suited for researchers who need 3D structural models of RNA sequences for which no experimental structure exists. Structural biologists can use predictions to guide cryo-EM data interpretation or to design crystallization constructs. RNA biologists working on regulatory non-coding RNAs, riboswitches, or CRISPR-associated RNAs can use the models to generate mechanistic hypotheses about structure-function relationships. The method is also applicable to drug discovery targeting RNA, where 3D structural context is increasingly sought for fragment screening and small-molecule design. At scale, the Rfam-wide application demonstrates utility for structural genomics efforts aimed at annotating entire RNA families.
trRosettaRNA represents a meaningful advance in automated RNA 3D structure prediction, arriving alongside a broader wave of deep learning methods — including ARES, DeepFoldRNA, and RhoFold — that collectively elevated the field during and after CASP15. Its competitive showing in the CASP15 blind test against human expert predictions validated the two-stage hybrid approach as a viable alternative to purely physics-based or purely data-driven methods. The large-scale application to 467 uncharacterized Rfam families is a concrete contribution to structural genomics, providing the community with a first structural reference for many biologically important RNA classes. A key limitation is that the approach depends on the availability of informative MSAs; for highly divergent or orphan RNAs with few known homologs, prediction accuracy degrades. The method also produces static single-conformation models and does not capture the conformational dynamics that are often central to RNA function.
Wang, W., Feng, C., Han, R., Wang, Z., Ye, L., Du, Z., Wei, H., Zhang, F., Peng, Z., & Yang, J. (2023). trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nature Communications, 14(1), 7266.
DOI: 10.1038/s41467-023-42528-4