Zhejiang University / Nanjing Tech University / Sichuan Agricultural University / Shanghai Institute of Materia Medica / Nanjing University / Chinese Academy of Sciences / University of Chinese Academy of Sciences
A diffusion-based generative RNA foundation model that designs novel RNA sequences conditioned on function, family, secondary/tertiary structure, or binding proteins.
RDiffusion is a diffusion-based generative model for the conditional design of novel RNA sequences. While tens of millions of non-coding RNA sequences have been cataloged and millions functionally annotated, this represents only a fraction of the vast RNA sequence space. RDiffusion is designed to explore this space systematically, generating RNA molecules tailored to user-specified biological requirements rather than retrieving or mutating existing sequences. It was introduced in a June 2026 bioRxiv preprint led by researchers at Zhejiang University, in collaboration with Nanjing Tech University, Sichuan Agricultural University, the Shanghai Institute of Materia Medica (Chinese Academy of Sciences), Nanjing University, and the University of Chinese Academy of Sciences.
The central novelty is conditioning a generative diffusion process on a diverse set of biological features—desired function, RNA family type, secondary structure, tertiary structure, or binding proteins—so that a single framework can address many distinct RNA design tasks. This contrasts with task-specific RNA design tools that typically optimize for one objective (for example, inverse folding to a target secondary structure). Beyond generation, the authors report that RDiffusion also serves as an RNA foundation model, achieving strong performance when its learned representations are applied to downstream prediction tasks.
RDiffusion sits at the intersection of generative modeling and RNA representation learning, extending the diffusion paradigm that has reshaped protein and small-molecule design into the RNA domain, where conditional generative foundation models remain comparatively underexplored.
RDiffusion is a conditional diffusion model that learns to generate RNA sequences guided by biological context. Conditioning signals span functional annotations, RNA family labels, secondary and tertiary structure, and RNA-binding-protein associations, allowing the same trained model to be steered toward different design objectives at inference time without per-task retraining. The authors evaluate the model across a broad spectrum of RNA design tasks and report that it outperforms all compared baselines on design success rate and sequence diversity, while also achieving state-of-the-art results on downstream representation-learning tasks—evidence the authors cite for its role as an RNA foundation model. As a preprint, exact parameter counts, training-corpus size, and per-benchmark numbers are not fully detailed in the abstract, and the osteoarthritis miRNA candidates were still undergoing experimental validation at the time of posting, with finalized results promised for the formal publication.
RDiffusion targets researchers in RNA therapeutics, synthetic biology, and gene editing who need to design RNA molecules with specified functional or structural properties. Use cases include generating non-coding RNAs that fold to a target structure, that belong to a desired family, or that interact with a particular protein, as well as de novo design of therapeutic candidates such as miRNAs. The reported osteoarthritis case study illustrates how the model can be embedded in a disease-focused design-and-screening workflow, while its foundation-model representations can support downstream RNA understanding tasks like classification and property prediction.
By unifying multiple RNA design objectives under a single conditional diffusion framework and coupling generative capability with transferable representations, RDiffusion extends the generative-foundation-model paradigm—already influential in protein and small-molecule design—into RNA engineering. If its reported gains in design success rate, sequence diversity, and downstream performance hold up under peer review and experimental validation, RDiffusion could lower the barrier to programmable RNA design for therapeutics and gene-editing tools. As of this writing, no public code repository or pretrained weights have been released; the authors indicate these, along with finalized validation data, will accompany formal publication, so independent reproducibility cannot yet be assessed.
Wang, J., et al. (2026) Unlocking Your Programmable and Creative RNA Sequence Designer with RDiffusion. openRxiv.
DOI: 10.64898/2026.06.13.732023Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data