GoForth

RNA inverse-folding language model that designs nucleotide sequences satisfying a target secondary structure, fixed bases, and coding constraints.

Released: May 2026

GoForth addresses RNA sequence design as a conditional generative modeling problem: given a target secondary structure, a set of fixed bases, and coding constraints, generate nucleotide sequences that satisfy all of them simultaneously. This is the inverse of the folding problem — rather than predicting how a given RNA folds, the model proposes sequences likely to fold into a desired shape while respecting other user-specified requirements such as preserving a reading frame or pinning specific positions.

The model was developed by Michael Lindsey and released as a preprint (arXiv:2605.07608) in May 2026. (The arXiv record does not list an institutional affiliation; the author is a faculty member in the Department of Mathematics at the University of California, Berkeley, so the organization here is attributed by inference rather than from the paper itself.) Its central design choice is to train a forward encoder-decoder language model directly on witnessed RNA folds rather than distilling from an inverse-design teacher. The method separates three components that are usually entangled: a sequence prior, a forward folding sampler, and a likelihood oracle.

GoForth sits within the growing space of RNA design and foundation models — alongside structure-aware language models such as ERNIE-RNA and RNA-FM — but is specialized for constraint-satisfying generation rather than representation learning. It ships not just as research code but as a self-contained local workbench, lowering the barrier to interactive RNA design.

Key Features

Multi-constraint conditioning: The model conditions on secondary-structure targets (dot-bracket notation with unknown and paired-unknown tokens), fixed or ambiguous bases, and coding constraints at the same time, rather than handling each in isolation.
Forward-trained on observed folds: Instead of learning from an inverse-design teacher, GoForth trains on witnessed RNA folds, separating the sequence prior, forward folding sampler, and likelihood oracle.
Fixed-checkpoint inference: Generation runs from pretrained PyTorch checkpoints with no per-task re-training, making candidate generation fast and reproducible.
ViennaRNA scoring loop: Candidate sequences are scored with ViennaRNA to compute minimum-free-energy and ensemble structures, structure/condition error, and Boltzmann ensemble diagnostics.
Local design workbench: A browser-based app with a local HTTP/API server lets users enter constraints, generate candidates, and inspect folded structures interactively, with CPU, CUDA, and Apple MPS support.

Technical Details

GoForth is a sequence-to-sequence autoregressive designer built as a PyTorch encoder-decoder language model with condition encoders for the different constraint modalities. Structure is supplied in dot-bracket notation with additional tokens for unknown (?) and paired-unknown (#) positions, while base masking allows concrete nucleotides alongside ambiguity tokens (?, N, #). Two released "small" checkpoints (~41 MB each) cover the main use cases: full_structure_small.pt for full-structure targets and fsb_partial_base_small.pt for partial structures and base constraints; both are distributed as GitHub release assets with SHA256 verification and fetched via scripts/download_checkpoints.sh. The model is trained on observed RNA fold data drawn from ETERNA100v2 and Rfam. Evaluation in the preprint covers full inverse-folding benchmarks and mixed structure/sequence/coding tasks, where the authors report fast, high-quality candidate generation along with learned semantic task embeddings and an emergent notion of design feasibility. Exact parameter counts are not stated in the public documentation.

Applications

GoForth is aimed at researchers designing functional RNAs — riboswitches, aptamers, structured untranslated regions, and other elements where a specific fold must coexist with fixed motifs or a preserved coding sequence. Because it accepts coding constraints alongside structure, it is well suited to mRNA and synthetic-biology workflows where a designed sequence must both fold correctly and translate a required protein. The bundled workbench makes it usable by experimentalists without a deep ML background: a user enters constraints, generates candidates, and inspects ViennaRNA-folded structures before committing to synthesis and wet-lab validation.

Impact

GoForth contributes a methodological shift for RNA design by showing that a forward language model trained on observed folds — rather than an inverse-design teacher — can generate sequences under combined structure, sequence, and coding constraints. As a recent single-author preprint with open Apache-2.0 code and a ready-to-run local workbench, its long-term adoption and benchmark standing are still emerging and should be read with the usual caveats for unreviewed work. Notable limitations include the absence of published parameter counts, reliance on secondary-structure (ViennaRNA) scoring rather than tertiary or pseudoknot-aware evaluation, and currently only "small" released checkpoints, leaving headroom for larger models and broader benchmarking.

Citation

GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints

Preprint

Lindsey, M. (2026) GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints. arXiv.

DOI: 10.48550/arXiv.2605.07608

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations230

Influential20

References49

GitHub

Stars0

Forks0

Open Issues0

Contributors1

Last Push2mo ago

LanguagePython

LicenseApache-2.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible

63Partial

Usability — can I run it?83

Reproducibility — can I retrain it?50

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Multi-constraint conditioning: The model conditions on secondary-structure targets (dot-bracket notation with unknown and paired-unknown tokens), fixed or ambiguous bases, and coding constraints at the same time, rather than handling each in isolation.

Forward-trained on observed folds: Instead of learning from an inverse-design teacher, GoForth trains on witnessed RNA folds, separating the sequence prior, forward folding sampler, and likelihood oracle.

Fixed-checkpoint inference: Generation runs from pretrained PyTorch checkpoints with no per-task re-training, making candidate generation fast and reproducible.

ViennaRNA scoring loop: Candidate sequences are scored with ViennaRNA to compute minimum-free-energy and ensemble structures, structure/condition error, and Boltzmann ensemble diagnostics.

Local design workbench: A browser-based app with a local HTTP/API server lets users enter constraints, generate candidates, and inspect folded structures interactively, with CPU, CUDA, and Apple MPS support.

Technical Details

Applications

Impact

GoForth

Key Features

Technical Details

Applications

Impact

Citation

GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

GoForth

Key Features

Technical Details

Applications

Impact

Citation

GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

GoForth

#Key Features

#Technical Details

#Applications

#Impact

Citation

GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

GoForth

#Key Features

#Technical Details

#Applications

#Impact

Citation

GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact