Overview

UNI-RNA is a universal RNA foundation model developed by DP Technology and released in July 2023. It was pre-trained on an unprecedented dataset of 1 billion RNA sequences drawn from multiple species and diverse RNA categories, making it the largest-scale RNA pre-training effort reported at the time. The model learns context-aware representations that capture evolutionary conservation patterns and structural constraints embedded in RNA sequences without requiring labeled data, then transfers this knowledge to downstream prediction tasks through fine-tuning.

The model addresses a long-standing bottleneck in RNA research: the scarcity of experimentally determined structures relative to the rapidly growing number of known RNA sequences. By learning from the statistical regularities of a billion sequences, UNI-RNA extracts signals that correlate with structure and function in a way that was previously inaccessible to smaller models trained on curated, annotated datasets.

UNI-RNA is an ensemble of models spanning 25 million to 400 million parameters. The 400M-parameter variant consistently achieves the strongest downstream performance, with empirical results suggesting the model reached a performance plateau at that scale given its architecture and training data. Access to the model and associated notebooks is provided through DP Technology's Bohrium platform rather than a traditional open-source repository.

Key Features

Billion-sequence pre-training: Trained on 1 billion RNA sequences from five well-known RNA databases, with redundancy reduction via MMseqs2 clustering to ensure broad coverage across coding and non-coding RNA families.
Scalable model ensemble: Offers models from 25M to 400M parameters, allowing users to balance computational cost against accuracy depending on the prediction task.
Modern transformer architecture: Built on BERT-style encoder architecture augmented with rotary position embeddings (RoPE), flash attention for 2-4x speedup in attention computation, and fused layer normalization for training stability.
Multitask generalization: A single pre-trained backbone fine-tunes to diverse tasks including secondary structure prediction, tertiary structure contact prediction, functional annotation, and RNA type classification.
State-of-the-art secondary structure prediction: The 400M-parameter model achieved an F1-score of 0.821 on the bpRNA-1m benchmark, outperforming all prior methods across precision and recall metrics at the time of publication.

Technical Details

UNI-RNA uses a BERT-style transformer encoder as its backbone, enhanced by three architectural choices that distinguish it from earlier RNA language models. Rotary position embeddings replace absolute positional encodings, providing better generalization across variable-length RNA sequences. Flash attention reduces memory overhead and accelerates training on long sequences. Fused layer normalization combines normalization operations to improve throughput and numerical stability during pre-training.

The training corpus comprises sequences from five established RNA databases covering both coding RNAs (mRNAs) and non-coding RNAs (ncRNAs including rRNA, tRNA, lncRNA, and others), spanning multiple kingdoms of life. MMseqs2 clustering was applied to reduce redundancy before pre-training, ensuring diversity. The model was pre-trained using masked language modeling on this 1 billion sequence corpus. For secondary structure prediction, structural contact maps are represented as two-dimensional distance matrices encoding pairwise inter-nucleotide distances in both 2D and 3D space. Benchmarked against the bpRNA-1m and PDB-derived test sets, UNI-RNA outperformed all previous methods in F1-score, precision, and recall at the time of publication.

Applications

UNI-RNA is suited to any RNA research task where learned sequence representations can substitute for or complement experimental data. Structural biologists can use the model for secondary structure prediction and tertiary contact prediction as priors before investing in experimental validation. Functional genomics researchers can fine-tune the backbone for regulatory element annotation, RNA-binding protein interaction prediction, or classification of novel ncRNA families. In therapeutic contexts, UNI-RNA may help accelerate mRNA optimization, identification of RNA-based drug targets, and characterization of ribozymes or aptamers. The Bohrium platform hosting provides notebook-based access that lowers the barrier for wet-lab groups without dedicated computational infrastructure.

Impact

UNI-RNA established the largest RNA pre-training benchmark at the time of its release and demonstrated that scaling sequence-level training beyond 100 million parameters yields measurable gains across a range of downstream RNA tasks. It contributed to a broader wave of RNA foundation models — including RNA-FM, RiNALMo, and ERNIE-RNA — that collectively shifted the field toward transfer learning approaches analogous to those pioneered in protein language models. A notable limitation is that UNI-RNA remains a preprint as of early 2026 and has not undergone formal peer review, and its access model through Bohrium notebooks is less open than repositories such as RNA-FM, restricting community-driven reproducibility. The paper's scope is also primarily limited to sequence-based tasks; explicitly modeling RNA 3D coordinates at atomic resolution, as done by tools like RhoFold+, lies outside UNI-RNA's current framework.

Overview

Key Features

Billion-sequence pre-training: Trained on 1 billion RNA sequences from five well-known RNA databases, with redundancy reduction via MMseqs2 clustering to ensure broad coverage across coding and non-coding RNA families.

Scalable model ensemble: Offers models from 25M to 400M parameters, allowing users to balance computational cost against accuracy depending on the prediction task.

Modern transformer architecture: Built on BERT-style encoder architecture augmented with rotary position embeddings (RoPE), flash attention for 2-4x speedup in attention computation, and fused layer normalization for training stability.

Multitask generalization: A single pre-trained backbone fine-tunes to diverse tasks including secondary structure prediction, tertiary structure contact prediction, functional annotation, and RNA type classification.

State-of-the-art secondary structure prediction: The 400M-parameter model achieved an F1-score of 0.821 on the bpRNA-1m benchmark, outperforming all prior methods across precision and recall metrics at the time of publication.

Technical Details

Applications

Impact

UNI-RNA

Overview

Key Features

Technical Details

Applications

Impact

Citation

UNI-RNA: Universal Pre-trained Models Revolutionize RNA Research

Metrics

Citations

Tags

Resources

UNI-RNA

Overview

Key Features

Technical Details

Applications

Impact

Citation

UNI-RNA: Universal Pre-trained Models Revolutionize RNA Research

Metrics

Citations

Tags

Resources