bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
RNA

RNAElectra

Australian National University

A single-nucleotide-resolution RNA foundation model pretrained on non-coding RNAs with ELECTRA-style replaced-token detection for RNA regulatory inference.

Released: March 2026

RNAElectra is a single-nucleotide-resolution RNA foundation model developed at the Australian National University and released as a bioRxiv preprint in March 2026. Most existing RNA language models, such as RNA-FM, are pretrained with masked language modeling (MLM), where the model only learns from the small fraction of positions that are masked. RNAElectra instead adopts the ELECTRA-style replaced-token detection (RTD) objective, which provides a learning signal at every position of every sequence and better aligns pretraining with the sequence-to-function fine-tuning tasks that matter downstream.

The model is pretrained on a diverse corpus of non-coding RNAs drawn from RNAcentral. By combining nucleotide-resolution tokenization with an efficient attention design, it is built to capture both local regulatory motifs and longer-range dependencies within a single reusable backbone, which can then be fine-tuned across structure, interaction, and regulatory tasks.

Positioned within the growing family of RNA foundation models, RNAElectra's contribution is methodological: it demonstrates that the dense supervision of replaced-token detection translates into broad gains over MLM-based RNA baselines across a wide range of regulatory inference problems.

#Key Features

  • Replaced-token detection pretraining: An ELECTRA-style RTD objective supervises every input position rather than only masked tokens, providing denser learning signal than masked language modeling.
  • Generator-discriminator design: A lightweight MLM generator (12 layers, hidden size 256) proposes context-dependent nucleotide replacements that a deeper discriminator (22 layers, hidden size 512) learns to detect.
  • Single-nucleotide resolution: Nucleotide-level tokenization preserves the fine granularity needed for motif- and modification-level tasks.
  • Single reusable backbone: One pretrained encoder transfers across structure, interaction, modification, and regulatory benchmarks.

#Technical Details

RNAElectra pairs a lightweight masked-language-model generator (12 transformer layers, hidden size 256) that proposes realistic, context-dependent nucleotide substitutions with a deeper discriminator (22 layers, hidden size 512) trained by replaced-token detection to classify, at each position, whether the observed nucleotide is original or replaced. Pretraining uses diverse non-coding RNA sequences from RNAcentral. Across a broad benchmark suite — RNA secondary structure and function, RNA-protein and RNA-RNA interactions, RNA modifications, translation efficiency, and mRNA stability — the model is reported to outperform RNA-FM and other RNA foundation model baselines.

#Applications

RNAElectra serves as a general-purpose backbone for RNA regulatory inference. Its representations can be fine-tuned to predict secondary structure, RNA-protein and RNA-RNA interactions, RNA modification sites, translation efficiency, and mRNA stability, supporting researchers in functional genomics, RNA biology, and the design of RNA-based therapeutics such as mRNA vaccines and oligonucleotides where stability and translation are key levers.

#Impact

By bringing ELECTRA's replaced-token detection to RNA, RNAElectra offers a more sample-efficient pretraining recipe than the masked language modeling that has dominated RNA foundation models, and reports consistent improvements over RNA-FM across structure, interaction, and regulatory tasks. The main caveat for adoption is availability: the preprint is released under a CC-BY license, but no public code or model weights have been confirmed, so independent benchmarking and downstream reuse are currently limited.

Tags

structure_predictionrna_protein_interactiontranslation_efficiencymrna_stability_predictiontransformerelectrafoundation_modelself_supervisednon_coding_rnarna_modifications