evoRate

Genome language model that adds evolutionary-rate prediction to pretraining, improving representations for variant effect and regulatory genomics.

Released: February 2026

evoRate is a genome language model (gLM) training approach that introduces evolutionary rate prediction as a pretraining objective, described in a February 2026 bioRxiv preprint led by researchers at the University of Toronto (with collaborators including Microsoft Research and the Broad Institute). Most genome language models are pretrained with sequence-reconstruction objectives borrowed from natural language processing — masked or autoregressive token prediction — yet recent studies have shown that such models often fail to capture meaningful biological signal. evoRate addresses this gap by training the model to predict how fast each position in the genome evolves.

The key design choice is that the evolutionary-rate objectives are composable with standard sequence reconstruction. This enables a clean, controlled comparison between predicting sequence only, evolutionary rate only, or both together, isolating the contribution of the evolutionary signal. To support this analysis, the authors build a suite of biologically grounded benchmarks, since existing gLM evaluations have notable gaps in measuring whether models learn functional biology.

By making evolution an explicit training target rather than an emergent hope, evoRate contributes to a broader shift in genomic foundation modeling toward objectives that encode functional and evolutionary constraints directly.

Key Features

Evolutionary-rate pretraining: Adds objectives that predict the rate of evolution at genomic positions, encoding functional constraint as a direct training signal.
Composable objectives: Evolutionary-rate tasks combine with sequence reconstruction, enabling controlled sequence-only vs. rate-only vs. combined comparisons.
Biologically grounded benchmarks: Introduces a new evaluation suite designed to address gaps in existing gLM benchmarks for functional and regulatory signal.
Parameter efficiency: Training on evolutionary rate makes relatively small models competitive with much larger existing gLMs on some tasks.
Variant effect gains: Models trained on both sequence and evolutionary rate outperform sequence-only models on established variant effect prediction benchmarks.

Technical Details

evoRate augments transformer-based genome language model pretraining with evolutionary rate prediction tasks — including predicting the evolutionary rate at each position given the preceding sequence — that can be composed with conventional sequence reconstruction. Across the authors' new biologically grounded benchmarks and on established variant effect prediction benchmarks, models pretrained on both sequence and evolutionary rate consistently outperform those trained on sequence alone. Notably, incorporating the evolutionary-rate objective allows the relatively small models studied here to rival substantially larger existing gLMs on certain tasks, establishing evolution as a key training target for genome-scale models. As a recent preprint, no public code or weight release is referenced in the manuscript.

Applications

evoRate is aimed at regulatory genomics and variant interpretation, where unlabeled genome language models promise to advance understanding without curated training labels. Improved representations and variant effect prediction make the approach relevant for prioritizing noncoding and coding variants, studying functional constraint, and building more sample-efficient genomic foundation models for downstream genomics tasks.

Impact

evoRate provides evidence that evolution-aware pretraining objectives address a recognized weakness of sequence-only genome language models — their tendency to miss biological signal — and that they can substitute for raw scale on some tasks. By releasing a biologically grounded benchmark suite alongside the method, the work also offers tools to better measure functional understanding in gLMs. As an unreviewed preprint without a referenced code release, the breadth of these gains awaits independent replication.

Citation

Predicting evolutionary rate as a pretraining task improves genome language model representations

Consens, M. E., et al. (2026) Predicting evolutionary rate as a pretraining task improves genome language model representations. bioRxiv.

DOI: 10.64898/2026.02.02.703275

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References33

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

14Closed

Usability — can I run it?13

Reproducibility — can I retrain it?4

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Evolutionary-rate pretraining: Adds objectives that predict the rate of evolution at genomic positions, encoding functional constraint as a direct training signal.

Composable objectives: Evolutionary-rate tasks combine with sequence reconstruction, enabling controlled sequence-only vs. rate-only vs. combined comparisons.

Biologically grounded benchmarks: Introduces a new evaluation suite designed to address gaps in existing gLM benchmarks for functional and regulatory signal.

Parameter efficiency: Training on evolutionary rate makes relatively small models competitive with much larger existing gLMs on some tasks.

Variant effect gains: Models trained on both sequence and evolutionary rate outperform sequence-only models on established variant effect prediction benchmarks.

Technical Details

Applications

Impact

evoRate

Key Features

Technical Details

Applications

Impact

Citation

Predicting evolutionary rate as a pretraining task improves genome language model representations

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

evoRate

Key Features

Technical Details

Applications

Impact

Citation

Predicting evolutionary rate as a pretraining task improves genome language model representations

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

evoRate

#Key Features

#Technical Details

#Applications

#Impact

Citation

Predicting evolutionary rate as a pretraining task improves genome language model representations

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

evoRate

#Key Features

#Technical Details

#Applications

#Impact

Citation

Predicting evolutionary rate as a pretraining task improves genome language model representations

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact