yakRNA Design

110M-parameter RNA language model that designs sequences from secondary structure, motif, and Gene Ontology constraints via discrete diffusion.

Released: April 2026

Parameters: 110 Million

yakRNA Design is a generative RNA language model that composes new RNA sequences from semantic, structural, and functional specifications rather than from sequence context alone. Where most RNA foundation models are trained primarily to understand sequences—predicting structure, stability, or variant effects—yakRNA is built to write them: a researcher supplies a target secondary structure, a consensus motif, and/or a description of desired function, and the model samples sequences that satisfy those constraints simultaneously. This positions it alongside structure-to-sequence inverse-folding tools, but with the added ability to condition on Gene Ontology (GO) terms as a proxy for biological function.

The model was developed at Stanford University and released as a bioRxiv preprint in April 2026 under the title "yakRNA Design: A semantic multimodal RNA composer." It is distributed as a single 110M-parameter checkpoint with an inference-only code repository, so users download pretrained weights and run conditional generation without retraining. The "semantic multimodal" framing refers to the model's joint conditioning interface, which mixes natural-language-style functional labels (GO terms), structural notation (dot-bracket), and sequence-level constraints in a single generation call.

The headline validation comes from a wet-lab design campaign for frameshift-stimulating RNA elements, structured RNAs that program ribosomal −1 frameshifting. Designing 84 candidates in a zero-shot setting, the authors report 17 experimentally active elements, including at least one design with no detectable identity to any known sequence in the searched universe—evidence that the model generates genuinely novel functional RNA rather than memorized variants of training examples.

Key Features

Multimodal conditioning: Generation can be guided by secondary structure (dot-bracket notation), a mixed-case IUPAC consensus sequence, and up to 12 GO terms at once, or any combination thereof, enabling fine-grained control over the designed output.
Discrete diffusion generation: Sequences are produced through a discrete diffusion process over a ModernBERT-based backbone, iteratively denoising masked nucleotide tokens toward constraint-satisfying outputs.
Function-aware design: Support for 280 GO terms lets users specify a desired biological role rather than only a structural target, distinguishing yakRNA from purely structure-conditioned inverse-folding methods.
Sequence infilling: Fixed positions can be combined with masked positions, allowing partial designs and scaffolding around conserved residues.
Configurable base-pairing: Five constraint sets (strict, canonical, canonical+sheared, canonical+common, permissive) tune how strictly the model enforces structural pairing during generation.

Technical Details

yakRNA Design is a 110M-parameter model built on a ModernBERT transformer backbone adapted for discrete diffusion over RNA token sequences, supporting designs up to 636 nucleotides. It was trained on the full Rfam database of structured RNA families, giving it broad coverage of non-coding RNA structural and functional space, and its GO-term conditioning vocabulary spans 280 functional categories. The released artifacts include the pretrained yakRNA_110M.pt checkpoint on HuggingFace and a command-line generator that accepts YAML configuration plus structure, consensus, and GO-term arguments and emits FASTA output. The model's central empirical result is the frameshift-stimulating RNA campaign, in which 17 of 84 zero-shot designs were experimentally active. The repository is inference-only and does not document training hyperparameters; detailed evaluation and ablations are reported in the preprint.

Applications

yakRNA Design targets RNA engineering tasks where a researcher knows what structure or function they want but not which sequence to use. Concrete uses include designing frameshift-stimulating elements and other structured regulatory RNAs, scaffolding novel sequences around conserved motifs via infilling, and proposing function-specified candidates for synthetic biology, riboswitch and aptamer engineering, and mRNA element design. Because it conditions on GO terms, it is well suited to exploratory design where the goal is a functional class rather than a precise structure, and the inference-only distribution lowers the barrier for experimental labs to generate candidate libraries.

Impact

yakRNA Design contributes to a shift in RNA AI from understanding-focused foundation models toward generative, function-conditioned design, paralleling earlier transitions in protein modeling from structure prediction to de novo design. Its most consequential result is experimental: zero-shot designs that are active in the lab, including a functional element with no recognizable relative in the searched sequence universe, supporting the claim that the model extrapolates beyond its training distribution. Adoption signals remain early given the recent preprint, and the inference-only release, sparse model card, and absence of a formal data card or published training recipe currently limit reproducibility and independent benchmarking.

Citation

yakRNA Design: A semantic multimodal RNA composer

Pinpin, L. N. & Khan, Y. A. (2026) yakRNA Design: A semantic multimodal RNA composer. bioRxiv.

DOI: 10.64898/2026.04.22.720245

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References30

GitHub

Stars4

Forks0

Open Issues0

Contributors1

Last Push1mo ago

LanguagePython

LicenseMIT

HuggingFace

Downloads0

Likes1

Last Modified2mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

48Partial

Usability — can I run it?87

Reproducibility — can I retrain it?9

open weights, closed recipe

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Multimodal conditioning: Generation can be guided by secondary structure (dot-bracket notation), a mixed-case IUPAC consensus sequence, and up to 12 GO terms at once, or any combination thereof, enabling fine-grained control over the designed output.

Discrete diffusion generation: Sequences are produced through a discrete diffusion process over a ModernBERT-based backbone, iteratively denoising masked nucleotide tokens toward constraint-satisfying outputs.

Function-aware design: Support for 280 GO terms lets users specify a desired biological role rather than only a structural target, distinguishing yakRNA from purely structure-conditioned inverse-folding methods.

Sequence infilling: Fixed positions can be combined with masked positions, allowing partial designs and scaffolding around conserved residues.

Configurable base-pairing: Five constraint sets (strict, canonical, canonical+sheared, canonical+common, permissive) tune how strictly the model enforces structural pairing during generation.

Technical Details

Applications

Impact

yakRNA Design

Key Features

Technical Details

Applications

Impact

Citation

yakRNA Design: A semantic multimodal RNA composer

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

yakRNA Design

Key Features

Technical Details

Applications

Impact

Citation

yakRNA Design: A semantic multimodal RNA composer

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

yakRNA Design

#Key Features

#Technical Details

#Applications

#Impact

Citation

yakRNA Design: A semantic multimodal RNA composer

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

yakRNA Design

#Key Features

#Technical Details

#Applications

#Impact

Citation

yakRNA Design: A semantic multimodal RNA composer

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact