bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
RNA

SCRU-Seq / SCRU-Diff

University of Virginia

Modular deep-learning framework for 3D-structure-based RNA sequence design, pairing a direct GNN predictor (SCRU-Seq) and a diffusion model (SCRU-Diff) built on self-contained RNA units.

Released: April 2026

Designing an RNA sequence that will fold into a target three-dimensional shape — the "inverse folding" problem — is a central challenge in synthetic biology and RNA therapeutics. Progress has been bottlenecked by the scarcity of high-resolution 3D RNA structures, which leaves data-hungry deep-learning models prone to overfitting, and by the computational cost of leading methods that rely on autoregressive or iterative sampling over whole molecules. SCRU-Seq and SCRU-Diff, introduced by Jian Wang and Nikolay V. Dokholyan at the University of Virginia School of Medicine in a 2026 bioRxiv preprint, attack both problems with a modular strategy that designs RNA from reusable structural building blocks rather than treating each molecule as monolithic.

The work first decomposes complex RNAs into Self-Contained RNA Units (SCRUs): structurally autonomous modules identified through tertiary-contact clustering, each of which behaves as a self-stabilizing, foldable physical unit. Assembling these units into the SCRU-DB database yields more than 61,000 SCRUs spanning over 8,200 unique structural clusters — a library substantially larger than prior RNA motif collections, which dramatically expands the effective training signal available from a limited pool of experimental structures.

On top of this data foundation the authors release two complementary, pretrained models with fixed checkpoints. SCRU-Seq is a graph neural network that predicts a sequence directly from a target structure in a single forward pass (O(1) inference), while SCRU-Diff is a diffusion model that refines sequences iteratively for higher accuracy. Together they let users trade off speed against fidelity within one framework.

#Key Features

  • Self-contained RNA units (SCRUs): A modular decomposition that breaks complex RNAs into foldable, structurally autonomous building blocks, turning scarce whole-molecule structures into a much larger pool of reusable training examples.
  • Two complementary models: SCRU-Seq offers fast, single-pass (O(1)) sequence prediction, while SCRU-Diff trades speed for accuracy through iterative diffusion-based refinement.
  • Large structural database: SCRU-DB compiles over 61,000 SCRUs across more than 8,200 unique structural clusters, exceeding the scale of previous RNA motif libraries.
  • State-of-the-art recovery: On the high-fidelity set112 benchmark, SCRU-Diff reaches a Best native sequence recovery (NSR) of 79.2% and SCRU-Seq achieves 63.7%, outperforming reported baselines.
  • Fixed, reusable checkpoints: Both models ship as pretrained checkpoints, supporting reproducible, plug-in RNA sequence design without per-task retraining.

#Technical Details

SCRU-Seq is a graph neural network that consumes a target 3D RNA backbone and predicts the underlying nucleotide sequence in a single non-autoregressive pass, giving constant-time inference relative to iterative competitors. SCRU-Diff is a generative diffusion model that conditions on the same target structure and denoises toward a sequence over multiple steps, recovering accuracy at the cost of additional compute. Both are trained on the SCRU-DB corpus of 61,000+ self-contained units (8,200+ clusters) and evaluated against established RNA inverse-folding systems such as NA-MPNN and RiboDiffusion. On the curated set112 benchmark, SCRU-Diff attains a Best NSR of 79.2% and SCRU-Seq attains 63.7% NSR, with the modular SCRU representation credited for the gains under limited 3D-structure data.

#Applications

The framework targets researchers in RNA nanotechnology, synthetic biology, and RNA-based therapeutics who need to engineer sequences that fold into specified 3D conformations — for example designing structured aptamers, ribozymes, riboswitches, or scaffolds for RNA drug development. SCRU-Seq's single-pass speed suits high-throughput screening and large design libraries, while SCRU-Diff's iterative refinement fits cases where maximizing structural fidelity matters more than runtime, letting practitioners choose the appropriate point on the speed-accuracy curve.

#Impact

By reframing RNA inverse folding around reusable self-contained units, this work offers a route past the field's defining obstacle — the scarcity of experimental 3D RNA structures — and reports state-of-the-art native sequence recovery on set112. The accompanying SCRU-DB database is itself a contribution that could support future RNA modeling efforts beyond sequence design. As of its 2026 preprint release the work has not yet been peer-reviewed, and no public code or model weights have been located, so independent reproduction and adoption remain to be demonstrated.

Tags

rna_designinverse_foldingsequence_designgraph_neural_networkdiffusiongenerativeself_supervisedrna_structure