EvoFlows

Edit-based flow-matching model that proposes protein variants by learning insertions, deletions, and substitutions on a template sequence.

Released: March 2026

EvoFlows is a generative protein-sequence model that learns mutational trajectories between evolutionarily related proteins and uses them to propose new variants. Rather than generating a sequence from scratch, it operates on a template sequence and applies a controllable number of edits—insertions, deletions, and substitutions—predicting both which edit to make and where to make it. This framing makes the model a natural fit for protein engineering and lead optimization, where the goal is usually to improve an existing protein rather than to design one de novo.

The approach addresses a structural mismatch between most protein language models and the optimization tasks they are applied to. Autoregressive models must regenerate a full sequence; masked language models and discrete diffusion models typically require the mutation locations to be specified in advance; and none of these paradigms naturally support length-changing edits (insertions and deletions) relative to a starting sequence. EvoFlows is built around a variable-length, edit-based formulation specifically to remove those constraints.

EvoFlows was developed by Nicolas Deutschmann, Constance Ferragu, Jonathan D. Ziegler, Shayan Aziznejad, and Eli Bixby at Cradle, an Amsterdam-based protein-engineering company. It was released as an arXiv preprint in March 2026 and presented at the Workshop on Foundation Models for Science at ICLR 2026.

Key Features

Edit-based generation: The model performs insertions, deletions, and substitutions on a template sequence, predicting both the type and the position of each edit rather than rewriting the whole sequence.
Variable-length output: Because insertions and deletions are first-class operations, generated variants can be longer or shorter than the template—something autoregressive, masked-LM, and diffusion baselines do not natively support.
Controllable mutational distance: The number of edits applied is tunable, letting practitioners dial how far a variant strays from its template while staying within a realistic sequence space.
Evolution-aware sampling: Trained on homologous sequence pairs, the model produces variants that remain consistent with natural protein families while exploring farther from the template than leading baselines.
General-purpose across protein families: Pretraining on UniRef and the Observed Antibody Space (OAS) covers both general proteins and antibody repertoires, supporting both broad protein design and antibody-focused work.

Technical Details

EvoFlows builds on discrete flow matching (DFM) and edit flows, casting protein variant generation as a learned transport between distributions of evolutionarily related sequence pairs. Training data are constructed from pairwise alignments of homologous sequences drawn from UniRef (general proteins) and OAS (antibodies); at inference time the model iteratively samples edits to transform a template into a novel variant. The authors evaluate generated variants with a battery of in-silico metrics—including model-based pseudo-log-likelihood, covariance and mutual-information statistics, BLOSUM-corrected KL divergence, and a spectrum- kernel maximum mean discrepancy—comparing against existing generative baselines. Across diverse protein families, EvoFlows generated variants that stayed consistent with the source family's statistics while reaching greater mutational distance from the template than the baselines. Detailed architecture and hyperparameter settings are provided in the paper's appendices.

Applications

EvoFlows targets protein engineering and lead optimization, where teams iteratively improve a known protein—an enzyme, a binder, or a therapeutic antibody—rather than design one from nothing. Its ability to insert and delete residues, not just substitute them, makes it applicable to tasks such as loop remodeling, length variation in antibody CDRs, and broader sequence diversification, while the tunable edit budget lets users balance conservative refinement against more aggressive exploration. The OAS pretraining makes the antibody-engineering setting a particularly natural use case.

Impact

EvoFlows extends discrete flow matching to a length-variable, edit-based setting for proteins, filling a gap left by autoregressive, masked-language, and diffusion models that either regenerate whole sequences or assume fixed mutation positions. By aligning the generative process with how protein engineers actually work—editing a template under a controllable mutation budget—it offers a more directly applicable tool for optimization campaigns. As a recent preprint from an industry group, its long-term influence is still emerging, and at the time of writing no public code or model weights have been released, which currently limits independent reproduction and reuse.

Citation

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

Preprint

Deutschmann, N., et al. (2026) EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering.

DOI: 10.48550/arXiv.2603.11703

Recent citations

Papers that recently cited this model.

Flexible Flows for Biological Sequence Design
Yogesh Verma, Dani Korpela, H. Lahdesmaki, et al.
Jun 2026
0
Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction
Emil R. Sharafutdinov, I. André
May 2026
0

Top citations

The most-cited papers that cite this model.

Flexible Flows for Biological Sequence Design
Yogesh Verma, Dani Korpela, H. Lahdesmaki, et al.
Jun 2026
0
Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction
Emil R. Sharafutdinov, I. André
May 2026
0

Citations

Total Citations2

Influential0

References41

Fields of citing research

Biology100%
Computer Science100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

21Closed

Usability — can I run it?14

Reproducibility — can I retrain it?12

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Edit-based generation: The model performs insertions, deletions, and substitutions on a template sequence, predicting both the type and the position of each edit rather than rewriting the whole sequence.

Variable-length output: Because insertions and deletions are first-class operations, generated variants can be longer or shorter than the template—something autoregressive, masked-LM, and diffusion baselines do not natively support.

Controllable mutational distance: The number of edits applied is tunable, letting practitioners dial how far a variant strays from its template while staying within a realistic sequence space.

Evolution-aware sampling: Trained on homologous sequence pairs, the model produces variants that remain consistent with natural protein families while exploring farther from the template than leading baselines.

General-purpose across protein families: Pretraining on UniRef and the Observed Antibody Space (OAS) covers both general proteins and antibody repertoires, supporting both broad protein design and antibody-focused work.

Technical Details

Applications

Impact

EvoFlows

Key Features

Technical Details

Applications

Impact

Citation

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

Recent citations

Flexible Flows for Biological Sequence Design

Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

Top citations

Flexible Flows for Biological Sequence Design

Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

Citations

Fields of citing research

Openness

Tags

Resources

EvoFlows

Key Features

Technical Details

Applications

Impact

Citation

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

Recent citations

Flexible Flows for Biological Sequence Design

Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

Top citations

Flexible Flows for Biological Sequence Design

Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

Citations

Fields of citing research

Openness

Tags

Resources

EvoFlows

#Key Features

#Technical Details

#Applications

#Impact

Citation

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

Recent citations

Flexible Flows for Biological Sequence Design

Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

Top citations

Flexible Flows for Biological Sequence Design

Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

Related models

Citations

Fields of citing research

Openness

Tags

Resources

EvoFlows

#Key Features

#Technical Details

#Applications

#Impact

Citation

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

Recent citations

Flexible Flows for Biological Sequence Design

Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

Top citations

Flexible Flows for Biological Sequence Design

Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact