ESMRank

Learning-to-rank variant effect predictor that aligns overlapping deep mutational scanning assays into an assay-agnostic tolerance measure.

Released: February 2026

ESMRank is a sequence-based variant effect predictor developed at the Telethon Institute of Genetics and Medicine (TIGEM) and posted to bioRxiv in February 2026. It targets a fundamental obstacle in learning from multiplexed assays of variant effect (MAVEs): the thousands of available deep mutational scanning (DMS) experiments measure different molecular phenotypes—stability, abundance, binding, enzymatic activity—on different scales, making their scores difficult to compare or combine directly. ESMRank reconciles this heterogeneity by reframing variant effect prediction as a learning-to-rank problem rather than a regression onto raw, assay-specific scores.

The core idea is an overlap-aware framework the authors call variant soundness. Many proteins are covered by more than one DMS assay, and the shared variants between overlapping assays provide anchors for aligning their internal rankings. ESMRank uses these overlaps to align within-assay rankings and then aggregate them across experiments, deriving an assay-agnostic measure of mutational tolerance that does not depend on any single assay's units or readout. The predictor itself integrates protein language model representations—from the ESM family—with physicochemical descriptors of residues and substitutions.

Applied to roughly 1,100 MAVEdb score sets spanning over 2 million variants, ESMRank recovers a coherent, transferable constraint landscape. The learned axis of mutational constraint is enriched for structural-stability determinants such as residue burial, packing-perturbation magnitude, and domain architecture, suggesting the model captures a biophysically meaningful and generalizable signal rather than overfitting to individual assays.

Key Features

Learning-to-rank formulation: Instead of regressing onto incomparable raw assay scores, ESMRank learns to rank variants, sidestepping the scale and readout differences across heterogeneous DMS experiments.
Overlap-aware "variant soundness": Shared variants between overlapping assays are used to align within-assay rankings and aggregate them into an assay-agnostic measure of mutational tolerance.
PLM + physicochemical features: Predictions combine ESM protein language model representations with physicochemical descriptors, blending learned evolutionary context with interpretable biophysical features.
Massive multi-assay scope: The model is fit across ~1,100 MAVEdb score sets covering more than 2 million variants, one of the broadest DMS aggregations used to define a transferable constraint axis.

Technical Details

ESMRank is a sequence-based learning-to-rank predictor that integrates protein language model representations with physicochemical descriptors of residues and substitutions. Its variant-soundness framework exploits the overlap structure of multiplexed assays: where multiple MAVEs measure the same protein, common variants are used to align rankings within each assay before aggregating across assays into a single, assay-agnostic mutational-tolerance scale. The model is applied to approximately 1,100 MAVEdb score sets encompassing over 2 million variants. The resulting constraint landscape is enriched for structural-stability determinants—including residue burial, the magnitude of packing perturbation introduced by a substitution, and domain architecture—indicating that the recovered axis reflects biophysical determinants of mutational tolerance. The preprint is released under a CC BY-NC-ND license; the authors do not report publicly released model weights at the time of posting.

Applications

ESMRank is intended for interpreting the functional impact of protein-coding variants, a central task in clinical genetics, protein engineering, and basic protein science. By producing an assay-agnostic measure of mutational tolerance, it can help prioritize variants of uncertain significance, guide stability-focused protein design, and provide a common reference frame for combining evidence across the many DMS assays now deposited in MAVEdb. Its enrichment for structural-stability determinants also makes it useful for studying how sequence position and biophysical context shape tolerance to mutation across diverse proteins.

Impact

ESMRank contributes a principled answer to a growing data-integration problem: as MAVEdb accumulates hundreds of heterogeneous DMS assays, methods that can align and pool them become increasingly valuable. The variant-soundness approach offers a reusable strategy for turning many incomparable assays into a single transferable constraint axis, and the model's biophysical interpretability strengthens confidence that the learned signal generalizes. As a February 2026 preprint released under a non-commercial license and without yet-reported public weights, its downstream adoption and head-to-head benchmarking against established variant effect predictors remain to be demonstrated.

Citation

ESMRank reveals a transferable axis of protein mutational constraint from overlapping variant effect assays

Arnese, R. & Gambardella, G. (2026) ESMRank reveals a transferable axis of protein mutational constraint from overlapping variant effect assays. bioRxiv.

DOI: 10.64898/2026.02.26.708185

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References53

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

10Closed

Usability — can I run it?7

Reproducibility — can I retrain it?13

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Learning-to-rank formulation: Instead of regressing onto incomparable raw assay scores, ESMRank learns to rank variants, sidestepping the scale and readout differences across heterogeneous DMS experiments.

Overlap-aware "variant soundness": Shared variants between overlapping assays are used to align within-assay rankings and aggregate them into an assay-agnostic measure of mutational tolerance.

PLM + physicochemical features: Predictions combine ESM protein language model representations with physicochemical descriptors, blending learned evolutionary context with interpretable biophysical features.

Massive multi-assay scope: The model is fit across ~1,100 MAVEdb score sets covering more than 2 million variants, one of the broadest DMS aggregations used to define a transferable constraint axis.

Technical Details

Applications

Impact

Citation

ESMRank reveals a transferable axis of protein mutational constraint from overlapping variant effect assays

Arnese, R. & Gambardella, G. (2026) ESMRank reveals a transferable axis of protein mutational constraint from overlapping variant effect assays. bioRxiv.

DOI: 10.64898/2026.02.26.708185

ESMRank

Key Features

Technical Details

Applications

Impact

Citation

ESMRank reveals a transferable axis of protein mutational constraint from overlapping variant effect assays

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

ESMRank

Key Features

Technical Details

Applications

Impact

Citation

ESMRank reveals a transferable axis of protein mutational constraint from overlapping variant effect assays

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

ESMRank

#Key Features

#Technical Details

#Applications

#Impact

Citation

ESMRank reveals a transferable axis of protein mutational constraint from overlapping variant effect assays

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

ESMRank

#Key Features

#Technical Details

#Applications

#Impact

Citation

ESMRank reveals a transferable axis of protein mutational constraint from overlapping variant effect assays

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact