bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
RNA foundation models
RNA

Mach-1

Broad Institute

Long-context (64 kb) RNA foundation model using the Striped-Hyena architecture for zero-shot prediction of transcriptome architecture from unspliced pre-mRNA sequence.

Released: April 2026

Mach-1 is a long-context RNA foundation model that maps unspliced pre-mRNA sequence to transcriptome architecture — including isoform abundances, RNA secondary structure, and the effects of splicing variants. The model uses a Striped-Hyena architecture, hybridizing convolutional state-space layers with sparse attention to scale to 64-kilobase context windows, sufficient to process most full-length human pre-mRNAs in a single forward pass.

Originally posted to bioRxiv in August 2024, Mach-1 underwent substantial extension and re-validation, with a major v3 update posted in April 2026 that included experimental validation through CRISPR editing and de novo transcript synthesis. The April 2026 version is treated here as the definitive release.

#Key Features

  • 64 kb context window: Captures full-length human pre-mRNA in a single forward pass, removing the need for sliding-window approximations that miss long-range exon-intron interactions.
  • Striped-Hyena architecture: Combines state-space convolutions with sparse attention to scale long-context modeling efficiently, following the Striped-Hyena recipe used in Evo and other long-context bio FMs.
  • Zero-shot transcriptome architecture: Predicts isoform abundance and splicing patterns directly from sequence without supervised fine-tuning.
  • Variant effect prediction: Computes splicing-altering effects of single-nucleotide variants and structural variants via difference-in-prediction analysis.
  • Experimental validation: April 2026 version includes CRISPR editing and de novo transcript-synthesis validation of model predictions.

#Technical Details

Mach-1 uses a Striped-Hyena state-space backbone trained on a curated corpus of human pre-mRNA sequences with paired transcriptome-level annotations from GTEx and ENCODE. The model is trained autoregressively over nucleotide tokens with auxiliary multitask heads predicting splice-site probabilities and isoform-abundance vectors. Training details, ablations, and benchmarking against SpliceAI and SpliceTransformer are reported in the bioRxiv preprint.

Validation in the April 2026 version includes CRISPR-based perturbation of predicted splicing-regulatory elements and synthesis of designed transcripts to confirm predicted abundance patterns.

#Applications

Mach-1 is suited for variant interpretation in clinical genomics where splicing effects are suspected, transcript engineering for therapeutic mRNA design with intronic regulatory cassettes, and basic RNA-biology research into splicing regulation. Its long-context capability is particularly important for genes with long introns and distally regulated exons.

#Impact

Mach-1 is among the first RNA foundation models to combine genuinely long context (64 kb) with experimental validation. By demonstrating that pre-mRNA-to-transcriptome modeling can be approached as a zero-shot foundation-model task, it extends the foundation-model paradigm to a problem class previously addressed primarily by purpose-built supervised models such as SpliceAI. The Striped-Hyena architecture choice positions it alongside Evo and Caduceus in the long-context bio-FM family.

Citation

Learning transcriptome architecture from sequence with a long-context RNA foundation model

Preprint

Saberi, A., et al. (2026) Learning transcriptome architecture from sequence with a long-context RNA foundation model. bioRxiv.

DOI: 10.1101/2024.08.26.609813

Recent citations

Papers that recently cited this model.

  • AI foundation models for RNA biology

    Haopeng Yu, Yiliang Ding

    RNA Biology · Mar 2026

    0
  • BioToken and BioFM – Biologically-Informed Tokenization Enables Accurate and Efficient Genomic Foundation Models

    Aleksandr Medvedev, Karthik Viswanathan, P. Kanithi, et al.

    bioRxiv · Nov 2025

    5
  • Advancing non-coding RNA annotation with RNA sequence foundation models: structure and function perspectives

    Naima Vahab, Sonika Tyagi

    BMC Artificial Intelligence · Oct 2025

    0

Top citations

The most-cited papers that cite this model.

  • BioToken and BioFM – Biologically-Informed Tokenization Enables Accurate and Efficient Genomic Foundation Models

    Aleksandr Medvedev, Karthik Viswanathan, P. Kanithi, et al.

    bioRxiv · Nov 2025

    5
  • mRNABench: A curated benchmark for mature mRNA property and function prediction

    Ruian (Ian) Shi, Taykhoom Dalal, Phil Fradkin, et al.

    bioRxiv · Jul 2025

    3
  • AI foundation models for RNA biology

    Haopeng Yu, Yiliang Ding

    RNA Biology · Mar 2026

    0
  • Advancing non-coding RNA annotation with RNA sequence foundation models: structure and function perspectives

    Naima Vahab, Sonika Tyagi

    BMC Artificial Intelligence · Oct 2025

    0

Citations

Total Citations4
Influential0
References53

Fields of citing research

  • Biology100%
  • Computer Science100%
  • Medicine75%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
39Closed
Usability — can I run it?51
Reproducibility — can I retrain it?37
Model Openness Framework
Unclassified
Restrictive license on core components

Resources

GitHub RepositoryResearch Paper