A structure-enhanced RNA language model that incorporates base-pairing constraints into self-attention, achieving state-of-the-art RNA structure and function prediction.
ERNIE-RNA (Enhanced Representations with Base-Pairing Restriction for RNA Modeling) is a pre-trained RNA language model that bridges the gap between sequence-based learning and structural understanding of RNA molecules. Unlike conventional RNA models that treat sequences as flat strings of nucleotides, ERNIE-RNA explicitly incorporates secondary structure constraints — base-pairing rules — directly into its self-attention mechanism during pre-training. This architectural choice allows the model to internalize how RNA folds without requiring external structure prediction tools or multiple sequence alignments as inputs.
The model was developed by researchers at Tsinghua University and published in Nature Communications in 2025. ERNIE-RNA is built on a modified BERT architecture and was pre-trained on 20.4 million non-redundant RNA sequences drawn from RNAcentral using masked language modeling (MLM). By weaving structural priors into the attention mechanism from the first layer, the model learns representations that simultaneously encode sequence identity and folding propensity.
In the landscape of RNA foundation models — alongside RNA-FM, RNAErnie, and SpliceBERT — ERNIE-RNA occupies a distinct position by making secondary structure an architectural constraint rather than a downstream task. This design enables credible zero-shot structure prediction directly from the model's attention maps, an emergent capability not previously demonstrated at this scale for RNA language models.
ERNIE-RNA is an 86-million parameter transformer trained with the BERT masked language modeling objective. The architecture consists of 12 transformer blocks, each with 12 attention heads and an embedding dimension of 768 (64 dimensions per head). The key innovation over standard BERT is a modified attention computation: in the first layer, a pairwise position matrix derived from the 1D sequence replaces the standard attention bias, encoding base-pairing propensity. In each subsequent layer, the attention map from the previous layer is recycled as the bias, creating an iterative structural refinement process through depth.
Training used sequences from RNAcentral with 15% of tokens randomly masked per sample. No structural labels were used during pre-training — the structural information enters solely through the attention bias construction. Benchmark evaluations were conducted on RNAStralign, ArchiveII, bpRNA, and RNA-binding protein datasets. On secondary structure prediction, fine-tuned ERNIE-RNA achieves an F1-score above 0.90 on several held-out families, while its zero-shot attention-map performance is competitive with supervised classical methods on short-to-medium length sequences.
ERNIE-RNA is well-suited for research tasks requiring an understanding of RNA structure alongside sequence. Structural biologists can use attention maps to hypothesize secondary structure topologies for novel non-coding RNAs prior to experimental validation. Functional genomics researchers can apply fine-tuned models to predict RNA-protein interactions and translation efficiency from 5' UTR sequences, which is directly relevant to mRNA therapeutics design. The dense sequence embeddings produced by ERNIE-RNA can be plugged into custom machine learning pipelines for tasks such as splice site prediction, RNA modification site identification, and functional RNA classification, reducing the need for hand-crafted sequence features.
ERNIE-RNA advances the RNA modeling field by demonstrating that structural priors can be embedded directly into a language model's architecture rather than added as separate supervision signals. Its publication in Nature Communications (2025) and concurrent development with models such as RNA-FM and RNAErnie reflect a broader recognition that RNA biology requires specialized foundation models beyond DNA or protein language model adaptations. A practical limitation of the current model is its focus on secondary structure: tertiary contacts, pseudoknots, and RNA-protein complex geometries are not directly modeled. Additionally, the 86M parameter scale, while efficient, may leave headroom for improvement on longer RNA molecules or on tasks requiring richer contextual representations, an area where larger successor models are likely to emerge.
Yin, W., Zhang, Z., Zhang, S., He, L., Zhang, R., Jiang, R., Liu, G., Wang, J., Zhang, X., Qin, T., & Xie, Z. (2025). ERNIE-RNA: an RNA language model with structure-enhanced representations. Nature Communications, 16(1), 10076.
DOI: 10.1038/s41467-025-64972-0