Overview

Proteina-Complexa is a partially latent flow-matching generative model from NVIDIA's GenAIR group for fully atomistic de novo design of protein binders against protein and small-molecule targets. Released as an arXiv preprint in March 2026 alongside large-scale experimental validation results, the model extends NVIDIA's earlier La-Proteina architecture to multi-molecular conditioning and demonstrates inference-time compute scaling — design quality increases monotonically with the number of samples drawn, achieving 68% target-level hit rates at million-design scale across 127 targets.

Notably, the validation campaign included the first reported de novo computational design of carbohydrate binders, addressing a long-standing target class that traditional protein-design tools have struggled with due to the geometric and chemical diversity of glycan ligands.

Key Features

Partially latent flow matching: Generates sequence and full atomistic structure jointly through a flow-matching objective in a partially latent representation, balancing geometric fidelity against computational tractability.
Multi-modal conditioning: Conditions on arbitrary target molecules — proteins, peptides, small molecules, and carbohydrates — within a unified framework.
Inference-time compute scaling: Sampling more candidates monotonically improves hit rates, enabling tunable trade-offs between compute and design quality.
Million-design experimental validation: Validated at unprecedented scale (1 million designs against 127 targets in a single multiplexed yeast-display experiment), with 68% of targets receiving at least one validated binder.
First de novo carbohydrate binders: First reported computational design of de novo proteins binding carbohydrate ligands, expanding the addressable target space.

Technical Details

Proteina-Complexa builds on the La-Proteina backbone (an ICLR 2026 paper) by adding target conditioning. The model represents proteins in a partially latent space that captures backbone frames at high precision and atomic-level coordinates through a learned encoder, then runs flow matching to transport from prior to data distribution. Training uses the Protein Data Bank for protein-target complexes and an in-house protein-carbohydrate dataset for glycan binders.

The validation experiment used a yeast-display assay multiplexed across 127 targets simultaneously, with sequencing readout to identify enriched designs. Hit rates are reported as the fraction of targets with at least one experimentally confirmed binder among the designs tested.

Applications

Proteina-Complexa is useful for early-stage therapeutic discovery in target classes that are hard to address with antibody or small-molecule modalities, including protein-protein interaction interfaces, carbohydrate binders for glycan-based diagnostics or therapeutics, and de novo binders against undruggable targets. The inference-time scaling property makes it well-suited to compute-rich industrial settings where researchers can trade GPU hours for hit rates.

Impact

Proteina-Complexa establishes inference-time compute scaling as a useful axis in generative protein design and demonstrates the most ambitious experimental validation campaign yet for a de novo binder design model, validating across 127 targets in a single experiment. The carbohydrate-binding result opens a previously inaccessible target class to computational design. Paired with La-Proteina (its open-source backbone), Proteina-Complexa positions NVIDIA as a serious contributor in the generative protein design space alongside Baker Lab (RFdiffusion) and Chroma developers.

Overview

Key Features

Partially latent flow matching: Generates sequence and full atomistic structure jointly through a flow-matching objective in a partially latent representation, balancing geometric fidelity against computational tractability.

Multi-modal conditioning: Conditions on arbitrary target molecules — proteins, peptides, small molecules, and carbohydrates — within a unified framework.

Inference-time compute scaling: Sampling more candidates monotonically improves hit rates, enabling tunable trade-offs between compute and design quality.

Million-design experimental validation: Validated at unprecedented scale (1 million designs against 127 targets in a single multiplexed yeast-display experiment), with 68% of targets receiving at least one validated binder.

First de novo carbohydrate binders: First reported computational design of de novo proteins binding carbohydrate ligands, expanding the addressable target space.

Technical Details

Applications

Impact

Proteina-Complexa

Overview

Key Features

Technical Details

Applications

Impact

Citation

Tags

Resources

Proteina-Complexa

Overview

Key Features

Technical Details

Applications

Impact

Citation

Tags

Resources