Atom Bioworks / University of Pennsylvania / GenScript / Duke-NUS Medical School
A masked discrete-diffusion model over millions of full-length mRNAs, guided by Monte Carlo Tree Search for joint codon optimization and de novo UTR design.
Therapeutic mRNA design requires coordinating many interacting sequence features across an entire transcript: codon usage in the coding region, the 5' and 3' untranslated regions (UTRs), and the coupling between them jointly determine stability, translation efficiency, and ultimately protein expression. Most existing tools treat these problems in isolation — optimizing codons against a fixed reference index, or selecting UTRs from curated libraries — and therefore miss the interactions that govern real-world performance. mRNAutilus, introduced in May 2026 by researchers at Atom Bioworks with collaborators at the University of Pennsylvania, GenScript, and Duke-NUS Medical School, reframes mRNA construction as a single multi-objective generative problem over the full-length transcript.
The model pairs a pretrained masked discrete-diffusion model (MDM) — trained on roughly 5.5 million full-length mRNA sequences — with Monte Carlo Tree Guidance (MCTG), a search procedure that steers generation toward sequences that are Pareto-efficient across several therapeutic objectives. Rather than designing the coding sequence and UTRs separately, mRNAutilus performs simultaneous codon optimization and de novo UTR design, generating complete transcripts in one process. Lightweight regressors built over the diffusion model's embeddings score candidate sequences for half-life, translation efficiency, and protein abundance, providing the rewards that guide the tree search.
mRNAutilus sits within a fast-growing class of generative models for nucleic-acid design, alongside autoregressive and language-model approaches to mRNA optimization, but is distinguished by its discrete-diffusion backbone and its explicit multi-objective search over the whole transcript.
mRNAutilus uses a ~150M-parameter BERT-style transformer as its diffusion backbone, with 20 attention heads, SwiGLU activations, Rotary Positional Embeddings, and FlashAttention-2, operating over a context of 7,500 tokens and an 86-token vocabulary spanning codons, nucleotides, and special tokens. Pretraining data was assembled from approximately 14.2 million full-length mRNA sequences and filtered down to 5,526,848 sequences. At generation time, Monte Carlo Tree Guidance runs selection, expansion, rollout, and backpropagation phases (with an exploration constant of 0.1) to manage Pareto-optimal sequence sets, using embedding-based regressors for half-life, translation efficiency, and protein abundance as the reward signals. Reported benchmarks include luciferase constructs around 400-fold above wild-type expression and Spike constructs nearly 2-fold above commercially optimized references, with additional demonstrations in prime-editing and proteome-modulation contexts.
mRNAutilus targets the design of therapeutic and research mRNAs — vaccines, protein-replacement therapies, and genome-editing payloads such as prime editors — where stability and high protein expression are critical. By generating optimized coding sequences and UTRs together, it can serve mRNA therapeutics developers, synthetic-biology labs, and protein-expression workflows that would otherwise iterate manually over codon tables and UTR libraries.
mRNAutilus advances mRNA design by treating codon optimization and UTR design as a unified, multi-objective generative task and demonstrating large expression gains on clinically relevant constructs. However, its practical reach is constrained by its release model: there is no public code or model weights, and access is provided only through the gated AutoNA web interface, with the paper and associated data released under a CC BY-NC-ND license. As an effectively closed release, it is best viewed as a demonstration of what discrete-diffusion-plus-search can achieve for mRNA engineering rather than a reproducible, openly extensible resource for the community.