Overview

ERNIE-RNA (Enhanced Representations with Base-Pairing Restriction for RNA Modeling) is a pre-trained RNA language model that bridges the gap between sequence-based learning and structural understanding of RNA molecules. Unlike conventional RNA models that treat sequences as flat strings of nucleotides, ERNIE-RNA explicitly incorporates secondary structure constraints — base-pairing rules — directly into its self-attention mechanism during pre-training. This architectural choice allows the model to internalize how RNA folds without requiring external structure prediction tools or multiple sequence alignments as inputs.

The model was developed by researchers at Tsinghua University and published in Nature Communications in 2025. ERNIE-RNA is built on a modified BERT architecture and was pre-trained on 20.4 million non-redundant RNA sequences drawn from RNAcentral using masked language modeling (MLM). By weaving structural priors into the attention mechanism from the first layer, the model learns representations that simultaneously encode sequence identity and folding propensity.

In the landscape of RNA foundation models — alongside RNA-FM, RNAErnie, and SpliceBERT — ERNIE-RNA occupies a distinct position by making secondary structure an architectural constraint rather than a downstream task. This design enables credible zero-shot structure prediction directly from the model's attention maps, an emergent capability not previously demonstrated at this scale for RNA language models.

Key Features

Structure-enhanced self-attention: Base-pairing constraints are incorporated into the attention bias at every layer, allowing the model to focus on biologically relevant nucleotide interactions without external structural inputs.
Zero-shot secondary structure prediction: Attention maps from the pre-trained model can predict RNA secondary structures without any fine-tuning, achieving F1-scores up to 0.55 and outperforming classical energy-minimization tools such as RNAfold and RNAstructure on benchmark sets.
State-of-the-art fine-tuned performance: After task-specific fine-tuning, ERNIE-RNA achieves top results across RNA secondary structure prediction, contact map prediction, UTR mean ribosome load (UTR-MRL) prediction, and RNA-protein binding site prediction.
Rich dual representations: Produces both interpretable attention maps (structural readout) and dense token embeddings (functional readout), giving downstream researchers flexible access to different levels of biological information.
Large pre-training corpus: Trained on 20.4 million non-redundant sequences from RNAcentral with CD-HIT clustering at 100% identity to minimize redundancy, ensuring broad coverage of the known RNA sequence space.

Technical Details

ERNIE-RNA is an 86-million parameter transformer trained with the BERT masked language modeling objective. The architecture consists of 12 transformer blocks, each with 12 attention heads and an embedding dimension of 768 (64 dimensions per head). The key innovation over standard BERT is a modified attention computation: in the first layer, a pairwise position matrix derived from the 1D sequence replaces the standard attention bias, encoding base-pairing propensity. In each subsequent layer, the attention map from the previous layer is recycled as the bias, creating an iterative structural refinement process through depth.

Training used sequences from RNAcentral with 15% of tokens randomly masked per sample. No structural labels were used during pre-training — the structural information enters solely through the attention bias construction. Benchmark evaluations were conducted on RNAStralign, ArchiveII, bpRNA, and RNA-binding protein datasets. On secondary structure prediction, fine-tuned ERNIE-RNA achieves an F1-score above 0.90 on several held-out families, while its zero-shot attention-map performance is competitive with supervised classical methods on short-to-medium length sequences.

Applications

ERNIE-RNA is well-suited for research tasks requiring an understanding of RNA structure alongside sequence. Structural biologists can use attention maps to hypothesize secondary structure topologies for novel non-coding RNAs prior to experimental validation. Functional genomics researchers can apply fine-tuned models to predict RNA-protein interactions and translation efficiency from 5' UTR sequences, which is directly relevant to mRNA therapeutics design. The dense sequence embeddings produced by ERNIE-RNA can be plugged into custom machine learning pipelines for tasks such as splice site prediction, RNA modification site identification, and functional RNA classification, reducing the need for hand-crafted sequence features.

Impact

ERNIE-RNA advances the RNA modeling field by demonstrating that structural priors can be embedded directly into a language model's architecture rather than added as separate supervision signals. Its publication in Nature Communications (2025) and concurrent development with models such as RNA-FM and RNAErnie reflect a broader recognition that RNA biology requires specialized foundation models beyond DNA or protein language model adaptations. A practical limitation of the current model is its focus on secondary structure: tertiary contacts, pseudoknots, and RNA-protein complex geometries are not directly modeled. Additionally, the 86M parameter scale, while efficient, may leave headroom for improvement on longer RNA molecules or on tasks requiring richer contextual representations, an area where larger successor models are likely to emerge.

Citation

ERNIE-RNA: an RNA language model with structure-enhanced representations

Yin, W., Zhang, Z., Zhang, S., He, L., Zhang, R., Jiang, R., Liu, G., Wang, J., Zhang, X., Qin, T., & Xie, Z. (2025). ERNIE-RNA: an RNA language model with structure-enhanced representations. Nature Communications, 16(1), 10076.

DOI: 10.1038/s41467-025-64972-0

Overview

Key Features

Structure-enhanced self-attention: Base-pairing constraints are incorporated into the attention bias at every layer, allowing the model to focus on biologically relevant nucleotide interactions without external structural inputs.

Zero-shot secondary structure prediction: Attention maps from the pre-trained model can predict RNA secondary structures without any fine-tuning, achieving F1-scores up to 0.55 and outperforming classical energy-minimization tools such as RNAfold and RNAstructure on benchmark sets.

State-of-the-art fine-tuned performance: After task-specific fine-tuning, ERNIE-RNA achieves top results across RNA secondary structure prediction, contact map prediction, UTR mean ribosome load (UTR-MRL) prediction, and RNA-protein binding site prediction.

Rich dual representations: Produces both interpretable attention maps (structural readout) and dense token embeddings (functional readout), giving downstream researchers flexible access to different levels of biological information.

Large pre-training corpus: Trained on 20.4 million non-redundant sequences from RNAcentral with CD-HIT clustering at 100% identity to minimize redundancy, ensuring broad coverage of the known RNA sequence space.

Technical Details

Applications

Impact

Citation

ERNIE-RNA: an RNA language model with structure-enhanced representations

DOI: 10.1038/s41467-025-64972-0

ERNIE-RNA

Overview

Key Features

Technical Details

Applications

Impact

Citation

ERNIE-RNA: an RNA language model with structure-enhanced representations

Metrics

GitHub

Citations

Tags

Resources

ERNIE-RNA

Overview

Key Features

Technical Details

Applications

Impact

Citation

ERNIE-RNA: an RNA language model with structure-enhanced representations

Metrics

GitHub

Citations

Tags

Resources