bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

PEINT

UC Berkeley

A deep generative model of protein evolution in time that captures indel dynamics and epistasis to simulate realistic evolutionary trajectories yielding functional proteins.

Released: February 2026

Models of molecular evolution underpin phylogenetics, ancestral sequence reconstruction, and our understanding of how protein families diversify. Yet the workhorse substitution models make a strong simplifying assumption: that sites evolve independently, each following its own Markov process. Real proteins violate this in two important ways — positions co-evolve through epistasis, and sequences gain and lose residues through insertions and deletions (indels) that classical site-independent models handle awkwardly or not at all.

PEINT, introduced by Koehl and colleagues at the University of California, Berkeley in a February 2026 bioRxiv preprint, is a deep generative framework that models the evolution of entire protein sequences in time while capturing dependencies between sites. Trained on millions of unaligned protein sequences spanning diverse folds, it learns indel dynamics and epistatic interactions directly, rather than imposing site independence. The model can both reproduce hallmark signatures of natural evolution — conservation patterns and family-specific behavior — and simulate evolutionary trajectories forward in time.

Critically, when used to simulate evolution along phylogenetic trees, PEINT generates novel sequences that remain functional: the authors experimentally tested simulated carbonic anhydrase variants and found that they preserved enzymatic activity, evidence that the model's trajectories respect structural and functional constraints rather than merely matching surface statistics.

#Key Features

  • Whole-sequence evolution in time: PEINT models the temporal evolution of complete protein sequences, moving beyond site-independent substitution models to capture how sequences change as a whole.
  • Indel dynamics: The framework directly captures insertions and deletions, processes that classical substitution models struggle to represent.
  • Epistasis-aware: By learning dependencies between positions, PEINT reproduces co-evolutionary signal and family-specific behavior rather than assuming independent sites.
  • Functional, experimentally validated trajectories: Simulated carbonic anhydrase variants generated along phylogenetic trees were shown experimentally to retain enzymatic activity.

#Technical Details

PEINT is a deep generative model trained on millions of unaligned protein sequences drawn from diverse folds, learning to model sequence evolution over time including both substitutions and indels. Because it operates on unaligned sequences and learns inter-site dependencies, it captures epistasis and indel dynamics that site-independent phylogenetic models omit. The authors show the model reproduces natural-evolution signatures such as conservation profiles and family-specific behavior, and that simulating evolution along phylogenetic trees yields novel, plausible sequences. Functional validity was assessed experimentally on simulated carbonic anhydrase variants, which retained enzymatic activity. As a recent preprint, code and trained weights are not yet released, and architectural specifics such as parameter count and exact training corpus await the full release.

#Applications

PEINT is aimed at molecular evolutionary biologists and protein engineers. As a generative model of sequence evolution, it can serve as a richer evolutionary model for phylogenetic inference and ancestral reconstruction, where indel handling and epistasis matter for accuracy. For protein engineering, its ability to simulate functional trajectories offers a principled way to explore new sequence space that still honors structural and functional constraints — for example, proposing diversified yet active homologs of an enzyme of interest, as demonstrated with carbonic anhydrase. The model thus bridges evolutionary analysis and generative design.

#Impact

PEINT challenges the long-standing site-independence assumption at the core of molecular evolution modeling, showing that a deep generative model can learn realistic indel and epistatic dynamics and, crucially, generate sequences that are experimentally functional. This connects two communities that rarely share models — phylogenetics and generative protein design — and suggests evolutionary realism and functional viability can be pursued together. As a February 2026 preprint without released code or weights, its results await peer review and independent reproduction, but the experimental validation of simulated enzyme variants is a notable proof point for the approach.

Tags

evolutionary_simulationphylogenetic_inferenceprotein_designtransformergenerativeself_supervisedmolecular_evolution