Albatross

RNA language model that predicts secondary structure of internal ribosome entry sites from sequence alone, trained on roughly 50,000 IRES sequences.

Released: May 2026

Albatross is an RNA language model that predicts secondary-structure features of internal ribosome entry sites (IRESes) directly from sequence, with accuracy reported to be comparable to experimental chemical probing. IRESes are structured RNA elements that recruit the ribosome and initiate translation independently of the canonical 5' cap, making them valuable for mRNA therapeutics, synthetic biology, and the study of viral and cellular gene expression. Because the structure of an IRES is tightly coupled to its activity, fast and accurate structural prediction is a long-standing bottleneck — experimental probing methods such as SHAPE and DMS-MaPseq are informative but laborious and difficult to scale across large element libraries.

The model was developed in the Rouskin Lab in the Department of Microbiology at Harvard Medical School and described in a bioRxiv preprint posted in May 2026 (Sychla, Bongrand, Yang, Rulison, Wesselhoeft, Bisaria, and Rouskin). Albatross is trained by self-supervised masked-nucleotide prediction on roughly 50,000 IRES sequences, learning the statistical regularities of IRES sequence and folding without explicit structural labels. Once trained, it generalizes to new sequences without re-training, allowing structural features to be inferred across very large sequence collections.

In the landscape of RNA foundation models — alongside RNA-FM, ERNIE-RNA, and 5' UTR-LM — Albatross is distinguished by its narrow specialization on IRES biology and its emphasis on translating learned representations into a practical, high-throughput structural mapping pipeline rather than a general-purpose RNA encoder. Note that this work is a preprint and has not yet completed peer review.

Key Features

IRES-specialized pretraining: Trained by masked-nucleotide prediction on approximately 50,000 IRES sequences, the model concentrates its representational capacity on the sequence and structural grammar of cap-independent translation elements.
Probing-comparable structure prediction: Predicted secondary-structure features are reported to match the accuracy of experimental chemical probing, offering a computational alternative to laborious wet-lab structure mapping.
Generalizes without re-training: Once trained, Albatross is applied to new IRES sequences directly, enabling structural inference at scales impractical for experimental approaches.
Large-scale structural mapping: The authors used the model to generate structural maps for roughly 75,000 full-length IRES elements, with 96 elements experimentally validated.
Functional discovery: Analysis of the mapped elements identified a "Type V" IRES class reported to roughly double the translational activity of the widely used EMCV standard.

Technical Details

Albatross is an RNA language model trained with a self-supervised masked-nucleotide prediction objective on a corpus of about 50,000 IRES sequences; this objective lets the model learn sequence and structural regularities without requiring labeled structures. The preprint does not state the underlying base model — that is, whether Albatross is trained from scratch or fine-tuned from a general-purpose RNA language model — nor does it report the parameter count, so these architectural details are currently unspecified. After training, the model was used to produce structural maps for approximately 75,000 full-length IRES elements, of which 96 were experimentally validated, and to surface a high-activity Type V IRES class reported to double EMCV-standard activity.

As of this writing, no public code or model weights have been confirmed: the Rouskin Lab GitHub organization does not yet host an Albatross repository. The preprint is released under a CC BY license, but the license that would govern any released model weights is unknown. Researchers seeking to reproduce or build on the work should monitor the lab's repositories for a future release.

Applications

Albatross is most directly useful for researchers engineering or characterizing IRES elements. In mRNA therapeutics and synthetic biology, where cap-independent translation can be exploited for multicistronic constructs or circular RNA designs, fast structural prediction helps prioritize candidates before costly experimental validation. The reported Type V IRES class, with roughly double the activity of the common EMCV benchmark, points to immediate utility in maximizing protein output per transcript. More broadly, the high-throughput structural mapping approach offers virologists and RNA biologists a way to triage and annotate large IRES libraries that would be infeasible to probe experimentally in full.

Impact

By demonstrating that a self-supervised model trained on IRES sequences can predict structural features at accuracy comparable to chemical probing — and then using that model to map tens of thousands of elements and surface a high-activity IRES class — Albatross illustrates how specialized RNA language models can compress experimental structural workflows into scalable in-silico pipelines. The discovery of Type V IRESes that double EMCV-standard activity is a concrete payoff with relevance to mRNA and gene-therapy design. The work's near-term impact is tempered by its preprint status and the current absence of confirmed public code, weights, or a stated base architecture and parameter count, all of which limit independent reproduction until the authors release additional artifacts.

Citation

An RNA Language Model trained on sequence alone reveals the structural logic of Internal Ribosome Entry Sites

Sychla, A., et al. (2026) An RNA Language Model trained on sequence alone reveals the structural logic of Internal Ribosome Entry Sites. bioRxiv.

DOI: 10.64898/2026.05.19.726202

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

15Closed

Usability — can I run it?15

Reproducibility — can I retrain it?0

not reproducible

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

IRES-specialized pretraining: Trained by masked-nucleotide prediction on approximately 50,000 IRES sequences, the model concentrates its representational capacity on the sequence and structural grammar of cap-independent translation elements.

Probing-comparable structure prediction: Predicted secondary-structure features are reported to match the accuracy of experimental chemical probing, offering a computational alternative to laborious wet-lab structure mapping.

Generalizes without re-training: Once trained, Albatross is applied to new IRES sequences directly, enabling structural inference at scales impractical for experimental approaches.

Large-scale structural mapping: The authors used the model to generate structural maps for roughly 75,000 full-length IRES elements, with 96 elements experimentally validated.

Functional discovery: Analysis of the mapped elements identified a "Type V" IRES class reported to roughly double the translational activity of the widely used EMCV standard.

Technical Details

Applications

Impact

Citation

An RNA Language Model trained on sequence alone reveals the structural logic of Internal Ribosome Entry Sites

Sychla, A., et al. (2026) An RNA Language Model trained on sequence alone reveals the structural logic of Internal Ribosome Entry Sites. bioRxiv.

DOI: 10.64898/2026.05.19.726202

Albatross

Key Features

Technical Details

Applications

Impact

Citation

An RNA Language Model trained on sequence alone reveals the structural logic of Internal Ribosome Entry Sites

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

Albatross

Key Features

Technical Details

Applications

Impact

Citation

An RNA Language Model trained on sequence alone reveals the structural logic of Internal Ribosome Entry Sites

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

Albatross

#Key Features

#Technical Details

#Applications

#Impact

Citation

An RNA Language Model trained on sequence alone reveals the structural logic of Internal Ribosome Entry Sites

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Albatross

#Key Features

#Technical Details

#Applications

#Impact

Citation

An RNA Language Model trained on sequence alone reveals the structural logic of Internal Ribosome Entry Sites

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact