University of Science and Technology of China
A fully open structure-guided RNA foundation model pretrained on ~21M RNA sequences paired with secondary structures, enabling robust structural and functional inference.
structRFM is a structure-guided RNA foundation model developed by researchers at the University of Science and Technology of China (USTC), with S. Kevin Zhou as corresponding author, and released as a bioRxiv preprint in August 2025. It addresses a recurring limitation of sequence-only RNA language models: because RNA function is largely dictated by how a molecule folds, models trained on nucleotide sequences alone struggle to internalize the base-pairing interactions that drive structural and functional behavior. structRFM closes this gap by jointly pretraining on RNA sequences and their secondary structures, baking folding information directly into the learned representations.
The model is pretrained from scratch on approximately 21 million RNA sequence–structure pairs drawn from RNAcentral, with secondary structures supplied by an ensemble annotation pipeline. Its central innovation is a structure-guided masked language modeling (SgMLM) objective that incorporates base-pairing interactions through a pair-matching operation, dynamically balancing sequence-level and structure-level masking during training. To mitigate the annotation bias inherent in any single structure predictor, structRFM uses MUSES, a multi-source ensemble that integrates thermodynamics-based, probability-based, and deep-learning-based secondary-structure predictors.
In the crowded landscape of RNA language models — alongside RNA-FM, RNAErnie, ERNIE-RNA, RiNALMo, and AIDO.RNA — structRFM is distinguished by being fully open (model weights and the complete training dataset are released) and by deriving a tertiary-structure predictor, Zfold, that is competitive with AlphaFold3 on standard RNA structure benchmarks. The authors position it as a general-purpose backbone spanning zero-shot, structural, and functional RNA inference tasks.
structRFM is a BERT-style encoder transformer with 12 layers, a hidden dimension of 768, and 12 attention heads (approximately 86 million parameters), with a maximum input length of 514 tokens corresponding to RNA sequences up to roughly 512 nucleotides. Longer RNAs (up to ~3,000 nt) are handled through a sliding-window strategy at inference. The training corpus consists of ~21 million sequence–structure pairs assembled from RNAcentral, filtered to sequences of 512 nucleotides or fewer, with secondary-structure labels produced by the MUSES ensemble. The model exposes three feature types — a classification-level feature, a sequence-level feature, and a pairwise matrix feature — that serve as flexible interfaces for downstream tasks.
Across benchmarks, structRFM reports top-ranked zero-shot homology classification among the RNA language models compared, state-of-the-art secondary structure prediction, an approximately 48% F1 gain on IRES identification, and tertiary-structure results (via Zfold) that exceed AlphaFold3 by about 19% on RNA-Puzzles while remaining competitive on CASP15 and CASP16. Zfold is implemented as a downstream task module within the structRFM repository rather than as a separately packaged tool, building on the pretrained backbone's pairwise representations.
structRFM serves RNA biologists and computational researchers across structural and functional workflows. Structural biologists can use it to predict secondary structures and, via Zfold, to generate tertiary-structure hypotheses for non-coding RNAs prior to experimental determination. RNA therapeutics and synthetic biology researchers benefit from its functional inference capabilities — IRES identification is directly relevant to designing cap-independent translation elements for mRNA constructs, while splice site prediction supports the study of alternative splicing. Its zero-shot homology classification and ncRNA classification capabilities help annotate novel transcripts and organize RNA families. Because both the weights and the full training dataset are openly released, the model is well suited as a reproducible backbone for custom fine-tuning pipelines.
structRFM advances the RNA foundation model field by demonstrating that explicitly pairing sequences with ensemble-derived secondary structures during pretraining yields representations that transfer strongly across structural and functional tasks, and that such a model can derive a tertiary-structure predictor competitive with AlphaFold3. Its fully open release — pretrained weights, the ~21M-pair training dataset on Zenodo, a HuggingFace model card, and code under an MIT license — sets a high bar for reproducibility in a field where training data is often withheld. As a preprint, its benchmark claims await peer review, and the model inherits practical constraints: a 512-nucleotide native window relying on sliding-window inference for longer RNAs, and structural supervision that is only as reliable as the MUSES ensemble that generated it. Even so, structRFM offers the community a transparent, structure-aware backbone spanning homology, structure, and functional RNA inference.
Zhu, H., et al. (2026) A fully open structure-guided RNA foundation model for robust structural and functional inference. bioRxiv.
DOI: 10.1101/2025.08.06.668731