TriFlow

University of Chicago / UT Southwestern Medical Center

Structure-conditioned protein sequence design, pairing a three-track architecture with discrete flow matching for fast, few-step inverse folding.

Released: December 2025

TriFlow is a structure-conditioned protein sequence design model that predicts amino acid sequences compatible with a given backbone structure — the inverse-folding step at the heart of modern de novo protein design pipelines. It was developed by Harish Srinivasan, Rongqing Yuan, Qian Cong, and Jian Zhou across the University of Chicago and UT Southwestern Medical Center, and posted to bioRxiv in December 2025.

The model addresses a key bottleneck in computational design: most state-of-the-art sequence designers, such as ProteinMPNN, rely on local structural context and autoregressive, residue-by-residue generation. TriFlow instead pairs a RoseTTAFold-like three-track architecture — which reasons over single, pairwise, and 3D rigid-frame representations to capture global structural context — with discrete flow matching, a generative framework that decodes many residues in parallel. The result is a designer that can produce an entire sequence in roughly ten inference steps regardless of protein length, while modeling longer-range structural dependencies than autoregressive methods.

TriFlow's headline contribution is in de novo binder design, where it is shown to raise the in silico success rate of leading pipelines such as BindCraft and RFdiffusion-based workflows. It sits in the same landscape as ProteinMPNN, ESM-IF, and PiFold, but is distinguished by its global, flow-matching formulation and its explicit emphasis on protein–protein interfaces.

Key Features

Three-track structural backbone: Inspired by AlphaFold2 and RoseTTAFold2, the network jointly updates single, pair, and 3D rigid-frame representations using gated attention with pair bias, invariant point attention, and triangle multiplicative updates for global context.
Discrete flow matching: Sequences are generated by a few-step discrete flow that decodes multiple residues simultaneously, designing a full sequence in as few as ten inference steps regardless of protein size.
Interface-enriched training: Training data includes interacting protein chains from the PDB plus predicted interacting domain pairs from the AlphaFold Database, enriching the model's knowledge of natural protein and domain interfaces.
Binder-design focus: TriFlow boosts the in silico success rate of BindCraft and RFdiffusion-based pipelines, validated across a benchmark of more than 500 protein targets.
Active-site discovery: Contrasting the structure-conditioned constraints learned by the model against natural evolutionary profiles highlights functional active sites.
Open-source release: Code and three pre-trained checkpoints — an AlphaFold Database variant (afdb_weights.pt), a PDB variant (pdb_weights.pt), and a soluble-dataset variant (soluble_weights.pt) — are released under the MIT license with user-facing tutorials.

Technical Details

TriFlow's architecture builds on an OpenFold codebase and incorporates components from MultiFlow, ProteinMPNN, and Protenix. Triangle attention over the pair representation is computed once and cached across flow timesteps for efficiency. The model was trained on PDB structures using the same splits as ProteinMPNN (August 2021 cutoff), extended with representative AlphaFold Database entries, interacting PDB chains, and predicted domain–domain interaction pairs. Training used crop sizes of 256 then 512 residues, backbone noise of 0.02 Å and 0.2 Å applied at equal probability, and a batch size of 8 per GPU across four A100 GPUs with an AdamW optimizer. The preprint does not state the total parameter count.

On benchmarks, TriFlow achieved higher native sequence recovery and improved refoldability relative to ProteinMPNN, ESM-IF, and PiFold, with gains most pronounced for 600–1000 residue backbones. On the Large-Scale Binder Design benchmark — 500 binary complexes with 16,216 binder backbones and 129,728 designed sequences — TriFlow gave a higher structure success rate on 143 of 500 targets in the RFdiffusion pipeline (versus 68 for ProteinMPNN) and on 200 of 500 targets in the ColabDesign/BindCraft pipeline (versus 93 for SolubleMPNN). Applied to human class I cytokines, it designed binders for 30 of 31 targets.

Applications

TriFlow serves protein engineers and computational biologists designing de novo binders, enzymes, and other functional proteins. It slots into existing pipelines as a drop-in replacement for ProteinMPNN or SolubleMPNN in the sequence-design stage following backbone generation by RFdiffusion or BindCraft. Its scaling behavior — where binder specificity improves with additional inference-time sampling — makes it well suited to large-scale campaigns, such as the demonstrated systematic design of specific binders against the human class I cytokine family while minimizing off-target interactions.

Impact

By coupling a global three-track structural model with parallel, few-step flow-matching generation, TriFlow offers an efficient alternative to autoregressive inverse-folding methods that have dominated the field since ProteinMPNN. Its measured improvements in binder-design success across more than 500 targets, combined with an open MIT-licensed release and tutorials, position it as a practical tool for therapeutic and research protein engineering. As a December 2025 preprint, its broader adoption and independent wet-lab validation remain to be established.

Citation

Modeling the structure-conditioned sequence landscape for large-scale protein design with TriFlow

Srinivasan, H., et al. (2025) Modeling the structure-conditioned sequence landscape for large-scale protein design with TriFlow. bioRxiv.

DOI: 10.64898/2025.11.30.691458

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References42

GitHub

Stars9

Forks0

Open Issues1

Contributors2

Last Push8d ago

LanguagePython

LicenseMIT

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

69Partial

Usability — can I run it?95

Reproducibility — can I retrain it?38

open weights, closed recipe

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper

Key Features

Three-track structural backbone: Inspired by AlphaFold2 and RoseTTAFold2, the network jointly updates single, pair, and 3D rigid-frame representations using gated attention with pair bias, invariant point attention, and triangle multiplicative updates for global context.

Discrete flow matching: Sequences are generated by a few-step discrete flow that decodes multiple residues simultaneously, designing a full sequence in as few as ten inference steps regardless of protein size.

Interface-enriched training: Training data includes interacting protein chains from the PDB plus predicted interacting domain pairs from the AlphaFold Database, enriching the model's knowledge of natural protein and domain interfaces.

Binder-design focus: TriFlow boosts the in silico success rate of BindCraft and RFdiffusion-based pipelines, validated across a benchmark of more than 500 protein targets.

Active-site discovery: Contrasting the structure-conditioned constraints learned by the model against natural evolutionary profiles highlights functional active sites.

Open-source release: Code and three pre-trained checkpoints — an AlphaFold Database variant (afdb_weights.pt), a PDB variant (pdb_weights.pt), and a soluble-dataset variant (soluble_weights.pt) — are released under the MIT license with user-facing tutorials.

Technical Details

Applications

Impact

TriFlow

Key Features

Technical Details

Applications

Impact

Citation

Modeling the structure-conditioned sequence landscape for large-scale protein design with TriFlow

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

TriFlow

Key Features

Technical Details

Applications

Impact

Citation

Modeling the structure-conditioned sequence landscape for large-scale protein design with TriFlow

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

TriFlow

#Key Features

#Technical Details

#Applications

#Impact

Citation

Modeling the structure-conditioned sequence landscape for large-scale protein design with TriFlow

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

TriFlow

#Key Features

#Technical Details

#Applications

#Impact

Citation

Modeling the structure-conditioned sequence landscape for large-scale protein design with TriFlow

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact