A hybrid simulation and machine-learning framework that predicts ribosome location profiles from mRNA sequence alone, combining a structure-aware TASEP with a Mamba polisher.
Ribosome profiling (Ribo-seq) measures the position-resolved density of translating ribosomes along mRNAs, providing a genome-wide readout of translation. However, generating these profiles requires labor-intensive sequencing experiments, and existing computational methods that estimate ribosome occupancy depend on experimental Ribo-seq reads or genomic context as input. seq2ribo addresses a harder problem: predicting the positional ribosome location profile of a transcript from its mRNA sequence alone, with no expression-level data required.
Developed by the Kingsford Group at Carnegie Mellon University and posted to bioRxiv in 2026, seq2ribo is a hybrid framework that couples a biophysical simulation of translation with a neural refinement network. A structure-aware Totally Asymmetric Simple Exclusion Process (sTASEP) simulates ribosome traffic along the transcript, and a Mamba-based "polisher" network corrects the simulated distribution toward observed profiles. The authors report it as the first method to achieve meaningful positional correlation with measured ribosome profiles from sequence alone.
By operating purely from sequence, seq2ribo fits a niche between mechanistic translation models and purely data-driven predictors, making it directly applicable to synthetic constructs and de novo designed mRNAs that have no genomic or experimental context.
seq2ribo is built around two coupled components. The sTASEP module is a stochastic traffic simulation in which ribosomes advance codon-by-codon with wait times conditioned on codon identity and structure-derived geometry features, capturing how local mRNA folding modulates elongation. Its output is then passed to a Mamba (selective state-space) polisher that operates over per-codon feature sequences to refine the predicted ribosome density. The repository bundles separate checkpoint sets for the ribosome-profiling, translation-efficiency, and protein-expression tasks across the four supported cell lines. On held-out evaluation, the authors report transcript-level Pearson correlations up to r ≈ 0.920 between predicted and observed profiles, with within-transcript shape correlations up to ≈0.186, reflecting that absolute-level prediction is substantially easier than capturing fine-grained positional shape. The code is released for academic and non-profit noncommercial research use, with commercial use requiring a separate license.
seq2ribo is aimed at synthetic biology and mRNA engineering, where researchers design constructs—reporter genes, therapeutic mRNAs, and codon-optimized transgenes—that lack any prior expression data. Because it predicts ribosome occupancy and downstream translation efficiency from sequence alone, it can be used to screen and rank candidate sequences in silico, study how codon choice and secondary structure shape elongation, and prioritize designs before committing to wet-lab synthesis and Ribo-seq validation. The bundled cell-line-specific checkpoints let users tailor predictions to common experimental systems.
By demonstrating that positionally meaningful ribosome profiles can be predicted from sequence alone, seq2ribo extends the toolkit for translation modeling beyond methods that require experimental reads or genomic context. Its hybrid design—pairing an interpretable, structure-aware biophysical TASEP simulation with a modern state-space neural network—offers a template for combining mechanistic and data-driven modeling in genomics. As a 2026 preprint, its long-term adoption remains to be established, and the modest within-transcript shape correlations highlight that accurately predicting fine positional detail of translation from sequence is still an open challenge.