SCRU-Seq / SCRU-Diff

RNA inverse folding framework pairing a graph neural network predictor with a diffusion model, designing sequences from self-contained RNA units.

Released: April 2026

Designing an RNA sequence that will fold into a target three-dimensional shape — the "inverse folding" problem — is a central challenge in synthetic biology and RNA therapeutics. Progress has been bottlenecked by the scarcity of high-resolution 3D RNA structures, which leaves data-hungry deep-learning models prone to overfitting, and by the computational cost of leading methods that rely on autoregressive or iterative sampling over whole molecules. SCRU-Seq and SCRU-Diff, introduced by Jian Wang and Nikolay V. Dokholyan at the University of Virginia School of Medicine in a 2026 bioRxiv preprint, attack both problems with a modular strategy that designs RNA from reusable structural building blocks rather than treating each molecule as monolithic.

The work first decomposes complex RNAs into Self-Contained RNA Units (SCRUs): structurally autonomous modules identified through tertiary-contact clustering, each of which behaves as a self-stabilizing, foldable physical unit. Assembling these units into the SCRU-DB database yields more than 61,000 SCRUs spanning over 8,200 unique structural clusters — a library substantially larger than prior RNA motif collections, which dramatically expands the effective training signal available from a limited pool of experimental structures.

On top of this data foundation the authors release two complementary, pretrained models with fixed checkpoints. SCRU-Seq is a graph neural network that predicts a sequence directly from a target structure in a single forward pass (O(1) inference), while SCRU-Diff is a diffusion model that refines sequences iteratively for higher accuracy. Together they let users trade off speed against fidelity within one framework.

Key Features

Self-contained RNA units (SCRUs): A modular decomposition that breaks complex RNAs into foldable, structurally autonomous building blocks, turning scarce whole-molecule structures into a much larger pool of reusable training examples.
Two complementary models: SCRU-Seq offers fast, single-pass (O(1)) sequence prediction, while SCRU-Diff trades speed for accuracy through iterative diffusion-based refinement.
Large structural database: SCRU-DB compiles over 61,000 SCRUs across more than 8,200 unique structural clusters, exceeding the scale of previous RNA motif libraries.
State-of-the-art recovery: On the high-fidelity set112 benchmark, SCRU-Diff reaches a Best native sequence recovery (NSR) of 79.2% and SCRU-Seq achieves 63.7%, outperforming reported baselines.
Fixed, reusable checkpoints: Both models ship as pretrained checkpoints, supporting reproducible, plug-in RNA sequence design without per-task retraining.

Technical Details

SCRU-Seq is a graph neural network that consumes a target 3D RNA backbone and predicts the underlying nucleotide sequence in a single non-autoregressive pass, giving constant-time inference relative to iterative competitors. SCRU-Diff is a generative diffusion model that conditions on the same target structure and denoises toward a sequence over multiple steps, recovering accuracy at the cost of additional compute. Both are trained on the SCRU-DB corpus of 61,000+ self-contained units (8,200+ clusters) and evaluated against established RNA inverse-folding systems such as NA-MPNN and RiboDiffusion. On the curated set112 benchmark, SCRU-Diff attains a Best NSR of 79.2% and SCRU-Seq attains 63.7% NSR, with the modular SCRU representation credited for the gains under limited 3D-structure data.

Applications

The framework targets researchers in RNA nanotechnology, synthetic biology, and RNA-based therapeutics who need to engineer sequences that fold into specified 3D conformations — for example designing structured aptamers, ribozymes, riboswitches, or scaffolds for RNA drug development. SCRU-Seq's single-pass speed suits high-throughput screening and large design libraries, while SCRU-Diff's iterative refinement fits cases where maximizing structural fidelity matters more than runtime, letting practitioners choose the appropriate point on the speed-accuracy curve.

Impact

By reframing RNA inverse folding around reusable self-contained units, this work offers a route past the field's defining obstacle — the scarcity of experimental 3D RNA structures — and reports state-of-the-art native sequence recovery on set112. The accompanying SCRU-DB database is itself a contribution that could support future RNA modeling efforts beyond sequence design. As of its 2026 preprint release the work has not yet been peer-reviewed, and no public code or model weights have been located, so independent reproduction and adoption remain to be demonstrated.

Citation

Modular Deep Learning for Direct RNA Sequence Design via Self-Contained RNA Units

Wang, J. & Dokholyan, N. V. (2026) Modular Deep Learning for Direct RNA Sequence Design via Self-Contained RNA Units. bioRxiv.

DOI: 10.64898/2026.04.16.719021

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References51

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

17Closed

Usability — can I run it?9

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper Official Website

Key Features

Self-contained RNA units (SCRUs): A modular decomposition that breaks complex RNAs into foldable, structurally autonomous building blocks, turning scarce whole-molecule structures into a much larger pool of reusable training examples.

Two complementary models: SCRU-Seq offers fast, single-pass (O(1)) sequence prediction, while SCRU-Diff trades speed for accuracy through iterative diffusion-based refinement.

Large structural database: SCRU-DB compiles over 61,000 SCRUs across more than 8,200 unique structural clusters, exceeding the scale of previous RNA motif libraries.

State-of-the-art recovery: On the high-fidelity set112 benchmark, SCRU-Diff reaches a Best native sequence recovery (NSR) of 79.2% and SCRU-Seq achieves 63.7%, outperforming reported baselines.

Fixed, reusable checkpoints: Both models ship as pretrained checkpoints, supporting reproducible, plug-in RNA sequence design without per-task retraining.

Technical Details

Applications

Impact

SCRU-Seq / SCRU-Diff

Key Features

Technical Details

Applications

Impact

Citation

Modular Deep Learning for Direct RNA Sequence Design via Self-Contained RNA Units

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

SCRU-Seq / SCRU-Diff

Key Features

Technical Details

Applications

Impact

Citation

Modular Deep Learning for Direct RNA Sequence Design via Self-Contained RNA Units

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

SCRU-Seq / SCRU-Diff

#Key Features

#Technical Details

#Applications

#Impact

Citation

Modular Deep Learning for Direct RNA Sequence Design via Self-Contained RNA Units

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

SCRU-Seq / SCRU-Diff

#Key Features

#Technical Details

#Applications

#Impact

Citation

Modular Deep Learning for Direct RNA Sequence Design via Self-Contained RNA Units

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact