bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

BetaReconstruct

Tel Aviv University

Generative hybrid-transformer for ancestral protein sequence reconstruction that needs no MSA or phylogenetic tree and outperforms maximum-likelihood ASR pipelines.

Released: January 2026

BetaReconstruct is a generative deep-learning model for ancestral sequence reconstruction (ASR), the task of inferring the protein sequences of extinct ancestral organisms from their extant descendants. Developed by researchers at Tel Aviv University and released as a bioRxiv preprint in January 2026, it reframes ASR as a sequence-to-sequence generation problem rather than the position-by-position statistical inference used by conventional phylogenetic pipelines.

ASR has long been a cornerstone of molecular evolution and protein engineering: resurrected ancestral proteins are frequently more thermostable, more soluble, and more catalytically promiscuous than their modern counterparts, making them attractive scaffolds for enzyme design and biotechnology. The dominant approach pairs a multiple sequence alignment (MSA) and an inferred phylogenetic tree with a probabilistic substitution model, then reconstructs the maximum-likelihood (ML) sequence at internal tree nodes. This pipeline is powerful but brittle — its output is sensitive to alignment errors, tree topology uncertainty, and substitution-model assumptions, and it struggles with insertions and deletions.

BetaReconstruct sidesteps these dependencies entirely. It is a hybrid-transformer model trained first on large-scale simulated evolutionary datasets, where the ground-truth ancestral sequences are known by construction, and then fine-tuned on real protein families. Because the model learns to map a set of related extant sequences directly to a predicted ancestor, it requires neither an MSA nor an explicit phylogenetic tree at inference time.

#Key Features

  • MSA- and tree-free reconstruction: Predicts ancestral sequences directly from a set of extant homologs, removing the alignment and phylogeny-building steps that dominate the error budget of classical ASR.
  • Simulation-pretrained, real-fine-tuned: Trained on large simulated evolutionary datasets with known ancestral ground truth, then fine-tuned on real proteins, combining abundant labeled supervision with biological realism.
  • Hybrid-transformer architecture: Couples transformer-based sequence modeling with a generative decoder, enabling the model to handle indels and variable-length reconstructions that challenge position-wise ML inference.
  • Outperforms maximum-likelihood pipelines: Reported to exceed the accuracy of established ML-based ASR tools on benchmark reconstructions, providing a learned alternative to model-based inference.

#Technical Details

BetaReconstruct is built on a hybrid-transformer generative architecture that ingests a collection of related extant protein sequences and emits a predicted ancestral sequence. Training proceeds in two stages. In the first, the model is pretrained on large-scale in-silico evolutionary datasets generated by simulating sequence evolution along trees; because these simulations track the true ancestral state at every internal node, they supply an effectively unlimited stream of perfectly labeled training pairs. In the second stage, the model is fine-tuned on real protein families to close the gap between simulated and natural sequence statistics. At inference, the model does not construct an alignment or estimate a tree, and it does not require a user-specified substitution model. The authors benchmark reconstruction accuracy against standard maximum-likelihood ASR pipelines and report improved performance. As of the preprint, no public code or trained weights have been released, which currently limits independent reproduction.

#Applications

Ancestral sequence reconstruction is widely used to engineer robust enzymes, to study the evolutionary trajectory of protein families, and to test hypotheses about historical biochemistry. BetaReconstruct is aimed at protein engineers and evolutionary biologists who want fast, alignment-independent ancestor predictions — for example, to generate thermostable enzyme variants for industrial biocatalysis, to probe the emergence of new functions across a gene family, or to design ancestral scaffolds as starting points for directed evolution. By removing the manual, error-prone alignment and tree-building steps, it lowers the barrier to running ASR on large or poorly characterized protein families.

#Impact

BetaReconstruct contributes to a broader shift in computational evolution toward learned, end-to-end models that replace hand-built statistical pipelines, paralleling how protein language models have displaced alignment-based methods in structure and function prediction. If its reported gains over maximum-likelihood ASR hold under independent evaluation, it could make ancestral reconstruction both faster and more accessible, particularly for families where reliable alignments and trees are difficult to obtain. The principal caveats are that the work is a preprint, that performance on simulated data may not fully transfer to deeply divergent real families, and that the absence of released code or weights makes the results difficult to reproduce or deploy at present.

Openness

bio.rodeo opennessClosed · low usability and reproducibility
4Closed
Usability — can I run it?7
Reproducibility — can I retrain it?0
not reproducible
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

ancestral_sequence_reconstructionprotein_designtransformergenerativetransfer_learningmolecular_evolutionproteomics

Resources

Research Paper