A structure-aware transformer that makes zero-shot, per-adenosine predictions of ADAR-mediated A-to-I RNA editing to guide therapeutic guide-RNA design.
Adenosine deaminase acting on RNA (ADAR) converts adenosine to inosine (read as guanosine) within double-stranded RNA. Because human cells already express ADAR, the enzyme can be recruited for therapeutic RNA editing by delivering a programmable guide RNA (gRNA) that forms a double-stranded substrate in trans with a target transcript — correcting a disease-causing base at the RNA level without permanently altering the genome. The central difficulty is that ADAR is promiscuous: a given gRNA can edit the intended adenosine but also nearby "bystander" sites, so designing guides that are both efficient and specific requires predicting, base by base, how strongly each adenosine in a target will be edited.
Helix, introduced in a December 2025 bioRxiv preprint from Shape Therapeutics, is a predictive model built for exactly this task. It makes highly accurate, zero-shot per-adenosine editing predictions for any target sequence — meaning it can score a new, unseen target without task-specific retraining. Its accuracy comes from two design choices: a transformer backbone that scales with large, class-imbalanced training data, and a structure-aware attention mechanism that incorporates predicted RNA secondary structure, a key determinant of ADAR activity.
Helix is distinct from the similarly themed AdarEdit entry: that model is an
academic graph-attention network (Stanford/Bar-Ilan) trained on endogenous editing of
inverted Alu repeats from GTEx tissues, whereas Helix is an industry transformer aimed
at engineering therapeutic gRNAs, trained on high-throughput editing screens. They
target overlapping biology with different architectures and goals.
Helix is a transformer-based sequence model for predicting outcomes of ADAR-mediated A-to-I editing, augmented with a structure-aware attention mechanism that folds predicted RNA secondary structure into the model so it can weigh the double-stranded context ADAR depends on. It is trained on high-throughput editing screens of guide–target pairs — including PolyTarget and single-target HTS libraries — spanning roughly 6,000 therapeutically relevant targets and on the order of 200,000 unique gRNAs, a regime in which edited sites are far rarer than unedited ones and which the authors cite as motivation for the scalable transformer design. Rather than operating in isolation, Helix anchors DeepHelix, a noisy-student distillation framework: Helix scores candidate gRNAs to provide functional pseudo-labels for training the DeepREAD generative model, DeepREAD then generates large candidate pools, and Helix scores, ranks, and filters those candidates for in-cell validation. The preprint reports that DeepHelix-designed guides efficiently edit a therapeutically relevant adenosine and can be engineered for cross-species reactivity. As an industry preprint, low-level hyperparameters (exact layer counts and parameter totals) and a public code or weights release are not specified, and the model is described as a proprietary internal platform.
Helix is built for RNA-editing therapeutic development: it lets researchers screen candidate ADAR guide RNAs in silico, predict on-target editing efficiency, and anticipate bystander editing before committing to wet-lab synthesis and cell-based assays. Within the DeepHelix loop it both pseudo-labels training data for the generative DeepREAD model and prioritizes generated guides for testing, compressing the design–build–test cycle. Its support for constraint-based and species cross-reactive designs is particularly useful for preclinical programs, where a single guide that edits both human and animal-model transcripts simplifies translational studies. The primary beneficiaries are therapeutic RNA-editing teams, though the structure-aware modeling of ADAR substrate preference is of broader interest to RNA biologists.
Helix illustrates a shift in therapeutic RNA editing from heuristic, structure-guided guide design toward data-driven prediction at single-adenosine resolution, and shows how a predictive model can be coupled with a generative one in a closed predict–generate–rank loop to design guides that meet efficiency, specificity, and cross-species constraints. By framing structure-aware editing prediction as the bottleneck and addressing it with a scalable transformer, the work points toward faster iteration on ADAR-based medicines. As a December 2025 preprint from a commercial developer, its claims await peer review and independent replication, and because the model and training data are proprietary, external groups cannot yet reproduce or directly build on Helix — its near-term influence is on Shape Therapeutics' own pipeline rather than the open research community.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data