Australian National University
A single-nucleotide-resolution RNA foundation model pretrained on non-coding RNAs with ELECTRA-style replaced-token detection for RNA regulatory inference.
RNAElectra is a single-nucleotide-resolution RNA foundation model developed at the Australian National University and released as a bioRxiv preprint in March 2026. Most existing RNA language models, such as RNA-FM, are pretrained with masked language modeling (MLM), where the model only learns from the small fraction of positions that are masked. RNAElectra instead adopts the ELECTRA-style replaced-token detection (RTD) objective, which provides a learning signal at every position of every sequence and better aligns pretraining with the sequence-to-function fine-tuning tasks that matter downstream.
The model is pretrained on a diverse corpus of non-coding RNAs drawn from RNAcentral. By combining nucleotide-resolution tokenization with an efficient attention design, it is built to capture both local regulatory motifs and longer-range dependencies within a single reusable backbone, which can then be fine-tuned across structure, interaction, and regulatory tasks.
Positioned within the growing family of RNA foundation models, RNAElectra's contribution is methodological: it demonstrates that the dense supervision of replaced-token detection translates into broad gains over MLM-based RNA baselines across a wide range of regulatory inference problems.
RNAElectra pairs a lightweight masked-language-model generator (12 transformer layers, hidden size 256) that proposes realistic, context-dependent nucleotide substitutions with a deeper discriminator (22 layers, hidden size 512) trained by replaced-token detection to classify, at each position, whether the observed nucleotide is original or replaced. Pretraining uses diverse non-coding RNA sequences from RNAcentral. Across a broad benchmark suite — RNA secondary structure and function, RNA-protein and RNA-RNA interactions, RNA modifications, translation efficiency, and mRNA stability — the model is reported to outperform RNA-FM and other RNA foundation model baselines.
RNAElectra serves as a general-purpose backbone for RNA regulatory inference. Its representations can be fine-tuned to predict secondary structure, RNA-protein and RNA-RNA interactions, RNA modification sites, translation efficiency, and mRNA stability, supporting researchers in functional genomics, RNA biology, and the design of RNA-based therapeutics such as mRNA vaccines and oligonucleotides where stability and translation are key levers.
By bringing ELECTRA's replaced-token detection to RNA, RNAElectra offers a more sample-efficient pretraining recipe than the masked language modeling that has dominated RNA foundation models, and reports consistent improvements over RNA-FM across structure, interaction, and regulatory tasks. The main caveat for adoption is availability: the preprint is released under a CC-BY license, but no public code or model weights have been confirmed, so independent benchmarking and downstream reuse are currently limited.