Partially latent flow-matching model for joint generation of protein amino-acid sequence and full atomistic structure (backbone plus side chains) for proteins up to 800 residues.
La-Proteina is a partially latent flow-matching generative model from NVIDIA's GenAIR group for joint generation of protein amino-acid sequence and full atomistic structure (backbone plus side chains). Originally announced as an ICLR 2026 paper and released publicly on GitHub in early 2026, La-Proteina is the architectural backbone underlying NVIDIA's later Proteina-Complexa target-conditioned binder design system.
The model generates proteins up to 800 residues in length with both sequence identity and complete atomic coordinates produced jointly. La-Proteina advances unconditional all-atom protein generation past the regime where most prior baselines fail to produce designable, foldable proteins at this length.
La-Proteina represents proteins in a partially latent space where backbone frames are modeled in an explicit Cartesian representation and atomic-level coordinates are encoded in a learned latent space. Flow matching transports samples from a Gaussian prior to the joint sequence-structure data distribution. The ICLR 2026 paper provides architectural details, training corpus (PDB-derived), and ablations on representation choices.
The released code and weights are available through NVIDIA's research GitHub. Inference can be performed at moderate compute cost on single high-end GPUs.
La-Proteina is suited for unconditional generative protein design at lengths where prior atomic-level baselines fail. Researchers can use La-Proteina to explore the space of designable proteins for downstream conditioning, scaffolding, or as a starting point for target-conditioned design through extensions like Proteina-Complexa.
La-Proteina advances the state of the art in unconditional all-atom protein generation by handling longer proteins than prior baselines and producing designable sequence-structure pairs jointly. As the architectural foundation for Proteina-Complexa, La-Proteina is a key piece of NVIDIA's growing protein-design ecosystem and a useful open-source reference implementation for future flow-matching-based generative protein models.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data