bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & GeneProtein

BetaInfer

Technion – Israel Institute of Technology / Tel Aviv University / Kempner Institute

Generative transformer framework that infers phylogenetic trees by transducing sets of unaligned molecular sequences directly into Newick-format trees.

Released: June 2026

Phylogenetic tree inference — reconstructing the evolutionary relationships among a set of related sequences — is a foundational problem in computational biology. Conventional pipelines are multi-stage: sequences are first aligned, and a tree is then inferred under a likelihood-based or distance-based criterion, typically with iterative heuristic search over the enormous space of possible topologies. BetaInfer, introduced in a 2026 bioRxiv preprint by Edo Dotan and colleagues, reframes this entire process as a single sequence transduction task, learning to map an unaligned set of molecular sequences directly to a tree.

BetaInfer treats a tree as a string in Newick notation and trains a hybrid transformer-based architecture to generate that string from the raw input sequences, borrowing the encoder–decoder transduction paradigm from natural language processing. This places it in the same lineage as the authors' earlier BetaAlign work, which applied transformers to multiple sequence alignment, and reflects a broader shift toward learned, end-to-end replacements for classical bioinformatics pipelines.

The model is trained once on large-scale simulated evolutionary data with known ground-truth trees and then applied — without any retraining or fine-tuning — to both held-out simulated datasets and empirical datasets, making it a genuine pretrained, generative foundation model for phylogenetics rather than a per-dataset optimizer.

#Key Features

  • Sequence-to-tree transduction: Maps a set of unaligned input sequences directly to a Newick-format tree, collapsing alignment, distance estimation, and tree search into one learned generative step.
  • Zero-shot generalization to real data: A single fixed checkpoint trained on simulated evolution is evaluated on simulated and empirical datasets without retraining, demonstrating transfer beyond the training distribution.
  • Ensemble candidate generation: Sampling multiple candidate trees and aggregating them reduces reconstruction error by more than 30% relative to a single greedy prediction.
  • Competitive accuracy: Reconstructions are competitive with established likelihood-based and distance-based phylogenetic methods.
  • Interpretable internal mechanism: Analysis of the trained model indicates it leverages internal pairwise-distance computations, echoing the logic of classical distance-based inference.

#Technical Details

BetaInfer uses hybrid transformer-based encoder–decoder architectures that consume a set of unaligned sequences and autoregressively emit a tree as a Newick string. Training relies on large-scale simulation: evolutionary histories with known ground-truth topologies are generated, and the model learns to recover the generating tree from the resulting sequences. Because supervision comes from simulated trees, the approach sidesteps the need for curated empirical training labels, and the same trained model is then applied zero-shot to new data. The reported headline result is that ensemble-based generation of candidate trees lowers reconstruction error by over 30% compared with single predictions, while remaining competitive against standard likelihood- and distance-based baselines. Interpretability analysis suggests the network implicitly computes pairwise distances between sequences as part of its inference. The work is released as a preprint under a CC BY-NC license; at the time of writing no public code or model weights were available.

#Applications

BetaInfer targets researchers in molecular evolution, comparative genomics, and systematics who need to reconstruct phylogenies from sets of related sequences. By replacing a multi-stage alignment-plus-search pipeline with a single forward pass, it offers a potentially faster and more scalable route to candidate trees, with the ensemble mechanism providing a built-in way to trade compute for accuracy. Its zero-shot applicability to empirical data means practitioners could, in principle, apply a pretrained model without configuring substitution models or tuning search heuristics for each dataset.

#Impact

BetaInfer is part of an emerging body of work demonstrating that generative, NLP-style models can serve as viable and scalable alternatives to classical phylogenetic inference pipelines. By showing competitive accuracy and a substantial ensemble-driven error reduction on both simulated and real data, it strengthens the case for learned end-to-end methods in a domain long dominated by likelihood and distance heuristics. As a preprint without released code or weights, its near-term adoption is limited and its results await independent benchmarking, but it points toward a future in which pretrained foundation models handle core comparative-genomics tasks directly from raw sequences.

Citation

Phylogenetic tree inference using generative models

Dotan, E., et al. (2026) Phylogenetic tree inference using generative models. bioRxiv.

DOI: 10.64898/2026.06.14.732140

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References26

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
8Closed
Usability — can I run it?7
Reproducibility — can I retrain it?10
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

generativegenomicsmolecular_evolutionphylogenetic_inferencesequence_to_sequencetransformertree_reconstruction

Resources

Research Paper