Protein

La-Proteina

NVIDIA

Partially latent flow-matching model for joint generation of protein amino-acid sequence and full atomistic structure (backbone plus side chains) for proteins up to 800 residues.

Released: 2026

Overview

La-Proteina is a partially latent flow-matching generative model from NVIDIA's GenAIR group for joint generation of protein amino-acid sequence and full atomistic structure (backbone plus side chains). Originally announced as an ICLR 2026 paper and released publicly on GitHub in early 2026, La-Proteina is the architectural backbone underlying NVIDIA's later Proteina-Complexa target-conditioned binder design system.

The model generates proteins up to 800 residues in length with both sequence identity and complete atomic coordinates produced jointly. La-Proteina advances unconditional all-atom protein generation past the regime where most prior baselines fail to produce designable, foldable proteins at this length.

Key Features

  • Joint sequence-structure generation: Produces amino-acid sequence and full atomistic structure (backbone plus side chains) jointly within a single generative pass.
  • Up to 800 residues: Handles substantially longer proteins than prior all-atom generative baselines that typically saturate around 200 to 300 residues.
  • Partially latent flow matching: Combines explicit and latent representations to balance geometric fidelity against computational tractability.
  • All-atom output: Side chains generated alongside backbone, removing the need for a separate sequence-design pipeline.
  • Architectural backbone for Proteina-Complexa: Same architecture, with target conditioning, underlies NVIDIA's binder design system.

Technical Details

La-Proteina represents proteins in a partially latent space where backbone frames are modeled in an explicit Cartesian representation and atomic-level coordinates are encoded in a learned latent space. Flow matching transports samples from a Gaussian prior to the joint sequence-structure data distribution. The ICLR 2026 paper provides architectural details, training corpus (PDB-derived), and ablations on representation choices.

The released code and weights are available through NVIDIA's research GitHub. Inference can be performed at moderate compute cost on single high-end GPUs.

Applications

La-Proteina is suited for unconditional generative protein design at lengths where prior atomic-level baselines fail. Researchers can use La-Proteina to explore the space of designable proteins for downstream conditioning, scaffolding, or as a starting point for target-conditioned design through extensions like Proteina-Complexa.

Impact

La-Proteina advances the state of the art in unconditional all-atom protein generation by handling longer proteins than prior baselines and producing designable sequence-structure pairs jointly. As the architectural foundation for Proteina-Complexa, La-Proteina is a key piece of NVIDIA's growing protein-design ecosystem and a useful open-source reference implementation for future flow-matching-based generative protein models.