bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein

D2D

Vrije Universiteit Brussel / Université libre de Bruxelles

Combines the ProtT5-XL protein language model with protein-specific evolutionary constraints to predict mutational effects on stability, binding, and epistasis—largely zero-shot.

Released: May 2026

D2D is a framework for predicting the effects of amino-acid substitutions on protein function, developed by Konstantina Tzavella, Adrian Olsen, and Wim Vranken at the Vrije Universiteit Brussel (with Tzavella also affiliated with the Institut Jules Bordet / Université libre de Bruxelles). Introduced in a 2026 bioRxiv preprint, D2D pairs a self-supervised protein language model with protein-specific evolutionary information to score mutational effects on thermostability, binding regions, and both single- and higher-order epistasis— largely without task-specific training.

The central idea is that a general-purpose language model trained on protein sequences captures broad biochemical regularities but misses constraints specific to a given protein family. D2D supplies that missing signal by fitting a Gaussian mixture model (GMM) over per-residue embeddings of a wild-type protein's multiple sequence alignment (MSA), producing "D2D features" that quantify how strongly each position is evolutionarily constrained. Combining these features with the language model yields predictions that remain accurate in disordered and surface-binding regions, where structure-based methods typically degrade.

D2D is the zero-shot generalization of the lab's earlier supervised model D2Deep (Briefings in Bioinformatics, peer-reviewed). D2Deep is a single-task, supervised classifier trained for variant pathogenicity (cancer driver-mutation prediction); D2D lifts the same evolutionary-constraint feature core into a broad, mostly training-free framework spanning multiple biophysical tasks. When fine-tuned for cancer driver prediction, D2D reuses D2Deep's trained classifier and reports state-of-the-art performance on that task.

#Key Features

  • Evolutionary constraints over LLM embeddings: A GMM fit to wild-type MSA embeddings (the "D2D features") injects protein-specific conservation signal that a generic language model alone does not capture.
  • Zero-shot across tasks: Stability, binding-region, and epistasis predictions require no task-specific training, making the method broadly applicable out of the box.
  • Robust in disordered and surface regions: Because it does not depend on a resolved 3D structure, D2D remains effective in intrinsically disordered and surface-binding regions where structure-based predictors struggle.
  • Higher-order epistasis: Captures both single-mutation effects and interactions among multiple substitutions, not just additive single-site scores.
  • Optional supervised cancer-driver mode: When fine-tuned, it reuses the peer-reviewed D2Deep classifier to predict driver mutations at state-of-the-art accuracy.

#Technical Details

D2D consumes embeddings from ProtT5-XL, a self-supervised transformer protein language model, as an external dependency loaded from HuggingFace rather than redistributed as its own checkpoint. For a query protein, it builds an MSA, embeds the aligned wild-type sequences, and fits a per-position Gaussian mixture model whose statistics form the D2D features used to score substitutions. The supervised cancer-driver capability is inherited directly from D2Deep's trained weights, which—along with code—are released on GitHub, with associated test data on Zenodo (CC-BY-4.0). Because predictions are built on ProtT5-XL, D2D inherits that model's input ceiling of roughly 2,200 residues. The 2026 D2D manuscript is a preprint and has not yet been peer-reviewed; the D2Deep predecessor is published in Briefings in Bioinformatics.

#Applications

D2D is aimed at researchers studying how mutations alter protein behavior: estimating stability changes, mapping binding regions, dissecting epistatic interactions in deep mutational scanning data, and—via the supervised D2Deep mode—prioritizing candidate cancer driver mutations. Its independence from experimental structure makes it especially useful for intrinsically disordered proteins and surface-binding interfaces that resist structure-based analysis. The D2Deep predecessor is accessible through a web server at tumorscope.be/d2deep; note that this serves the supervised variant, as no dedicated D2D server exists yet.

#Impact

D2D illustrates a practical recipe for protein variant-effect prediction: augment a general protein language model with cheaply computed, protein-specific evolutionary statistics instead of retraining large models per task. By generalizing the peer-reviewed D2Deep classifier into a zero-shot, multi-task framework, the work extends a validated pathogenicity tool toward stability, binding, and epistasis while highlighting performance in the disordered and surface regions that structure-based methods handle poorly. As a preprint that relies on an external ProtT5-XL dependency and D2Deep's supervised weights for its cancer-driver mode, its long-term influence will depend on peer review and broader benchmarking, but it offers a lightweight, openly coded path to evolution-aware mutation scoring.

Citations

Constrained protein Large Language Model illustrated in protein stability, function and epistasis

Tzavella, K., et al. (2026) Constrained protein Large Language Model illustrated in protein stability, function and epistasis. bioRxiv.

DOI: 10.64898/2026.05.22.726784

Combining evolution and protein language models for an interpretable cancer driver mutation prediction with D2Deep

Tzavella, K., et al. (2024) Combining evolution and protein language models for an interpretable cancer driver mutation prediction with D2Deep. bioRxiv.

DOI: 10.1093/bib/bbae664

Openness

Unclassified
Restrictive license on core components

Tags

binding_region_predictionepistasisintrinsically_disordered_regionsprotein_stabilityself_supervisedtransformervariant_effect_predictionzero_shot

Resources

GitHub RepositoryGitHub RepositoryDemoDataset