Overview

Mach-1 is a long-context RNA foundation model that maps unspliced pre-mRNA sequence to transcriptome architecture — including isoform abundances, RNA secondary structure, and the effects of splicing variants. The model uses a Striped-Hyena architecture, hybridizing convolutional state-space layers with sparse attention to scale to 64-kilobase context windows, sufficient to process most full-length human pre-mRNAs in a single forward pass.

Originally posted to bioRxiv in August 2024, Mach-1 underwent substantial extension and re-validation, with a major v3 update posted in April 2026 that included experimental validation through CRISPR editing and de novo transcript synthesis. The April 2026 version is treated here as the definitive release.

Key Features

64 kb context window: Captures full-length human pre-mRNA in a single forward pass, removing the need for sliding-window approximations that miss long-range exon-intron interactions.
Striped-Hyena architecture: Combines state-space convolutions with sparse attention to scale long-context modeling efficiently, following the Striped-Hyena recipe used in Evo and other long-context bio FMs.
Zero-shot transcriptome architecture: Predicts isoform abundance and splicing patterns directly from sequence without supervised fine-tuning.
Variant effect prediction: Computes splicing-altering effects of single-nucleotide variants and structural variants via difference-in-prediction analysis.
Experimental validation: April 2026 version includes CRISPR editing and de novo transcript-synthesis validation of model predictions.

Technical Details

Mach-1 uses a Striped-Hyena state-space backbone trained on a curated corpus of human pre-mRNA sequences with paired transcriptome-level annotations from GTEx and ENCODE. The model is trained autoregressively over nucleotide tokens with auxiliary multitask heads predicting splice-site probabilities and isoform-abundance vectors. Training details, ablations, and benchmarking against SpliceAI and SpliceTransformer are reported in the bioRxiv preprint.

Validation in the April 2026 version includes CRISPR-based perturbation of predicted splicing-regulatory elements and synthesis of designed transcripts to confirm predicted abundance patterns.

Applications

Mach-1 is suited for variant interpretation in clinical genomics where splicing effects are suspected, transcript engineering for therapeutic mRNA design with intronic regulatory cassettes, and basic RNA-biology research into splicing regulation. Its long-context capability is particularly important for genes with long introns and distally regulated exons.

Impact

Mach-1 is among the first RNA foundation models to combine genuinely long context (64 kb) with experimental validation. By demonstrating that pre-mRNA-to-transcriptome modeling can be approached as a zero-shot foundation-model task, it extends the foundation-model paradigm to a problem class previously addressed primarily by purpose-built supervised models such as SpliceAI. The Striped-Hyena architecture choice positions it alongside Evo and Caduceus in the long-context bio-FM family.

Overview

Key Features

64 kb context window: Captures full-length human pre-mRNA in a single forward pass, removing the need for sliding-window approximations that miss long-range exon-intron interactions.

Striped-Hyena architecture: Combines state-space convolutions with sparse attention to scale long-context modeling efficiently, following the Striped-Hyena recipe used in Evo and other long-context bio FMs.

Zero-shot transcriptome architecture: Predicts isoform abundance and splicing patterns directly from sequence without supervised fine-tuning.

Variant effect prediction: Computes splicing-altering effects of single-nucleotide variants and structural variants via difference-in-prediction analysis.

Experimental validation: April 2026 version includes CRISPR editing and de novo transcript-synthesis validation of model predictions.

Technical Details

Validation in the April 2026 version includes CRISPR-based perturbation of predicted splicing-regulatory elements and synthesis of designed transcripts to confirm predicted abundance patterns.

Applications

Impact

Mach-1

Overview

Key Features

Technical Details

Applications

Impact

Citation

Learning transcriptome architecture from sequence with a long-context RNA foundation model

Metrics

Citations

Tags

Resources

Mach-1

Overview

Key Features

Technical Details

Applications

Impact

Citation

Learning transcriptome architecture from sequence with a long-context RNA foundation model

Metrics

Citations

Tags

Resources