University of Illinois Urbana-Champaign / ByteDance Seed
A fully atomic protein co-design model using unified multimodal diffusion to jointly refine atom types and coordinates in a single stage, with support for non-canonical amino acids.
A-CODE (Atomic CO-DEsign) is a generative foundation model for protein design that operates entirely at atomic granularity. Rather than treating protein generation as a multi-stage problem—first laying out a backbone, then designing a sequence, then packing side chains—A-CODE casts the full task as a single unified process in which discrete atom types and continuous atom coordinates are refined simultaneously. Amino acid identities emerge from atom-level predictions rather than being assigned in a separate sequence-design step. The work was introduced in a May 2026 preprint by researchers at the University of Illinois Urbana-Champaign and ByteDance Seed.
The central problem A-CODE addresses is the error accumulation and modeling mismatch inherent in cascaded protein co-design pipelines, where decisions made early (such as backbone geometry) constrain later stages and small inconsistencies compound. By unifying everything into one all-atom diffusion process, A-CODE lets sequence and structure inform one another throughout generation. The authors report that this formulation is particularly effective on difficult binder-design problems, where prior one-stage approaches have struggled.
A notable consequence of the fully atomic formulation is that A-CODE is, according to the authors, the first protein co-design model to support non-canonical amino acids (ncAAs). Because the model reasons over atoms rather than a fixed alphabet of 20 residues, it can in principle accommodate chemistries outside the canonical set—an area of growing interest for designing proteins with novel functions.
A-CODE is built on a multimodal diffusion framework that operates over all atoms in a protein rather than over residue-level tokens or backbone frames. During generation, the model iteratively denoises both the discrete atom-type assignments and the continuous 3D coordinates, with amino acid identity read out from the predicted atomic composition. The model is trained on protein structures from the Protein Data Bank (PDB). The authors evaluate unconditional generation by designability and report superior performance over existing approaches, and on binder design they report a roughly tenfold improvement in success rate on hard tasks relative to prior one-stage co-design methods. The preprint does not state the model's parameter count or the detailed composition of its training set.
A-CODE targets de novo protein design and protein binder design—designing new proteins that fold to desired structures and bind specified targets. The fully atomic, single-stage formulation is especially aimed at hard binder-design tasks where staged pipelines tend to fail. Its ability to reason about non-canonical amino acids opens potential applications in designing proteins with chemistries outside the natural alphabet, of interest to protein engineers and synthetic biologists exploring expanded functional repertoires. As with all computational design tools, generated candidates would require experimental validation.
A-CODE contributes to a broader shift in protein design from frame- or residue-based generative models toward fully atomic representations, joining all-atom efforts in the generative-protein-design space. Its claimed first-in-class support for non-canonical amino acid co-design and its reported gains on hard binder tasks position it as a notable step for one-stage co-design. As of the preprint's release, the authors had not released code or weights, citing an intent for responsible release; the paper is distributed under a CC BY-NC-SA 4.0 license. Because the work is a preprint and its training-data composition and parameter count are unstated, its real-world impact and reproducibility remain to be established through community evaluation.