Partially latent flow-matching generative model for de novo atomistic protein binder design against protein and small-molecule targets, with experimental validation at million-design scale.
Proteina-Complexa is a partially latent flow-matching generative model from NVIDIA's GenAIR group for fully atomistic de novo design of protein binders against protein and small-molecule targets. Released as an arXiv preprint in March 2026 alongside large-scale experimental validation results, the model extends NVIDIA's earlier La-Proteina architecture to multi-molecular conditioning and demonstrates inference-time compute scaling — design quality increases monotonically with the number of samples drawn, achieving 68% target-level hit rates at million-design scale across 127 targets.
Notably, the validation campaign included the first reported de novo computational design of carbohydrate binders, addressing a long-standing target class that traditional protein-design tools have struggled with due to the geometric and chemical diversity of glycan ligands.
Proteina-Complexa builds on the La-Proteina backbone (an ICLR 2026 paper) by adding target conditioning. The model represents proteins in a partially latent space that captures backbone frames at high precision and atomic-level coordinates through a learned encoder, then runs flow matching to transport from prior to data distribution. Training uses the Protein Data Bank for protein-target complexes and an in-house protein-carbohydrate dataset for glycan binders.
The validation experiment used a yeast-display assay multiplexed across 127 targets simultaneously, with sequencing readout to identify enriched designs. Hit rates are reported as the fraction of targets with at least one experimentally confirmed binder among the designs tested.
Proteina-Complexa is useful for early-stage therapeutic discovery in target classes that are hard to address with antibody or small-molecule modalities, including protein-protein interaction interfaces, carbohydrate binders for glycan-based diagnostics or therapeutics, and de novo binders against undruggable targets. The inference-time scaling property makes it well-suited to compute-rich industrial settings where researchers can trade GPU hours for hit rates.
Proteina-Complexa establishes inference-time compute scaling as a useful axis in generative protein design and demonstrates the most ambitious experimental validation campaign yet for a de novo binder design model, validating across 127 targets in a single experiment. The carbohydrate-binding result opens a previously inaccessible target class to computational design. Paired with La-Proteina (its open-source backbone), Proteina-Complexa positions NVIDIA as a serious contributor in the generative protein design space alongside Baker Lab (RFdiffusion) and Chroma developers.
Didi, K., et al. (2026) Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute.
DOI: 10.48550/arXiv.2603.27950Papers that recently cited this model.
Bowen Jing, Mihir Bafna, Daniel J. Diaz, et al.
bioRxiv · Jun 2026
G. Scarpellini, Ron Shprints, Peter Holderrieth, et al.
Jun 2026
Hanqun Cao, Z. Quinn, Aastha Pal, et al.
Jun 2026
The most-cited papers that cite this model.
Tomas Geffner, Kieran Didi, Zhonglin Cao, et al.
arXiv.org · Jul 2025
Yewon Han, Maxim I. Tsenkov, N. Venanzi, et al.
bioRxiv · Mar 2026
Kieran Didi, Danny Reidenbach, Matthew Penner, et al.
Aditi Gupta, Soon Hoe Lim, Annan Yu, et al.
May 2026
G. Scarpellini, Ron Shprints, Peter Holderrieth, et al.
Jun 2026
Share of papers citing this model.