University of Chicago / UT Southwestern Medical Center
Structure-conditioned protein sequence design model combining a RoseTTAFold-like three-track architecture with discrete flow matching for fast, few-step inverse folding.
TriFlow is a structure-conditioned protein sequence design model that predicts amino acid sequences compatible with a given backbone structure — the inverse-folding step at the heart of modern de novo protein design pipelines. It was developed by Harish Srinivasan, Rongqing Yuan, Qian Cong, and Jian Zhou across the University of Chicago and UT Southwestern Medical Center, and posted to bioRxiv in December 2025.
The model addresses a key bottleneck in computational design: most state-of-the-art sequence designers, such as ProteinMPNN, rely on local structural context and autoregressive, residue-by-residue generation. TriFlow instead pairs a RoseTTAFold-like three-track architecture — which reasons over single, pairwise, and 3D rigid-frame representations to capture global structural context — with discrete flow matching, a generative framework that decodes many residues in parallel. The result is a designer that can produce an entire sequence in roughly ten inference steps regardless of protein length, while modeling longer-range structural dependencies than autoregressive methods.
TriFlow's headline contribution is in de novo binder design, where it is shown to raise the in silico success rate of leading pipelines such as BindCraft and RFdiffusion-based workflows. It sits in the same landscape as ProteinMPNN, ESM-IF, and PiFold, but is distinguished by its global, flow-matching formulation and its explicit emphasis on protein–protein interfaces.
afdb_weights.pt), a PDB variant (pdb_weights.pt), and a soluble-dataset variant
(soluble_weights.pt) — are released under the MIT license with user-facing tutorials.TriFlow's architecture builds on an OpenFold codebase and incorporates components from MultiFlow, ProteinMPNN, and Protenix. Triangle attention over the pair representation is computed once and cached across flow timesteps for efficiency. The model was trained on PDB structures using the same splits as ProteinMPNN (August 2021 cutoff), extended with representative AlphaFold Database entries, interacting PDB chains, and predicted domain–domain interaction pairs. Training used crop sizes of 256 then 512 residues, backbone noise of 0.02 Å and 0.2 Å applied at equal probability, and a batch size of 8 per GPU across four A100 GPUs with an AdamW optimizer. The preprint does not state the total parameter count.
On benchmarks, TriFlow achieved higher native sequence recovery and improved refoldability relative to ProteinMPNN, ESM-IF, and PiFold, with gains most pronounced for 600–1000 residue backbones. On the Large-Scale Binder Design benchmark — 500 binary complexes with 16,216 binder backbones and 129,728 designed sequences — TriFlow gave a higher structure success rate on 143 of 500 targets in the RFdiffusion pipeline (versus 68 for ProteinMPNN) and on 200 of 500 targets in the ColabDesign/BindCraft pipeline (versus 93 for SolubleMPNN). Applied to human class I cytokines, it designed binders for 30 of 31 targets.
TriFlow serves protein engineers and computational biologists designing de novo binders, enzymes, and other functional proteins. It slots into existing pipelines as a drop-in replacement for ProteinMPNN or SolubleMPNN in the sequence-design stage following backbone generation by RFdiffusion or BindCraft. Its scaling behavior — where binder specificity improves with additional inference-time sampling — makes it well suited to large-scale campaigns, such as the demonstrated systematic design of specific binders against the human class I cytokine family while minimizing off-target interactions.
By coupling a global three-track structural model with parallel, few-step flow-matching generation, TriFlow offers an efficient alternative to autoregressive inverse-folding methods that have dominated the field since ProteinMPNN. Its measured improvements in binder-design success across more than 500 targets, combined with an open MIT-licensed release and tutorials, position it as a practical tool for therapeutic and research protein engineering. As a December 2025 preprint, its broader adoption and independent wet-lab validation remain to be established.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data