PLAID

Latent diffusion model for controllable all-atom protein generation that co-designs sequence and structure while training on sequences alone.

Released: December 2024

Parameters: 2 Billion

PLAID (Protein Latent Induced Diffusion) is a generative model for designing proteins that produces both amino-acid sequence and all-atom 3D structure in a single sampling process. It was developed by Amy X. Lu and collaborators at UC Berkeley and Genentech (with co-authors including Nathan Frey, Frances Arnold, and Pieter Abbeel) and released as a preprint in December 2024. The work tackles a long-standing tension in protein generative modeling: structure-based diffusion models require expensive experimentally solved structures for training, while sequence-only language models do not directly yield 3D coordinates or side-chain placements.

PLAID's central idea is to run latent diffusion over the internal representation space of a pretrained structure predictor rather than over raw coordinates or tokens. Because that latent space already entangles sequence and structure, PLAID can be trained using only protein sequences—orders of magnitude more abundant than solved structures—yet still generate full all-atom outputs by decoding samples back through the structure predictor. This sidesteps the data bottleneck that constrains many backbone-generation methods.

The model supports conditional generation guided by biological function and taxonomy, making it a controllable design tool rather than only an unconditional sampler. It sits alongside all-atom design approaches such as Protpardelle and RFdiffusionAA, but is distinguished by its sequence-only training signal and its function/organism conditioning.

Key Features

Sequence-only training, all-atom output: PLAID learns from sequence databases alone but generates complete all-atom structures by sampling in the latent space of the ESMFold structure predictor, avoiding the need for solved structures during training.
Compressed latent diffusion: Diffusion is performed over the compact CHEAP autoencoder latent (a separate component) rather than over high-dimensional coordinates, making training and sampling tractable.
Function and organism conditioning: Classifier-free guidance on Gene Ontology (GO) function terms and organism taxonomy lets users steer generation toward desired biological properties.
Experimental validation: The authors report wet-lab characterization of generated heme-binding proteins, providing evidence that conditioning produces functionally relevant designs rather than only plausible structures.
Open weights and code: Both a 2B-parameter and a 100M-parameter checkpoint are released under an MIT license, with the diffusion weights hosted on HuggingFace.

Technical Details

PLAID is a latent diffusion model operating over the shared sequence-structure representation derived from ESMFold, compressed by the CHEAP autoencoder. Two model sizes are released: a 2-billion-parameter variant and a 100-million-parameter variant. Training uses only protein sequences, with the structural decoder providing the bridge to all-atom coordinates at inference time. Conditional generation is implemented via classifier-free guidance over GO function indices and organism taxonomy indices, and the pipeline can determine protein length automatically. The released code requires a custom OpenFold fork and the companion CHEAP latent autoencoder, with model caches handled automatically.

Applications

PLAID is aimed at protein engineers and computational biologists who need to generate novel candidate proteins with targeted function or taxonomic context—for example, proposing enzymes or binding proteins associated with a particular GO annotation. Because it emits all-atom structures alongside sequences, downstream users can immediately inspect side-chain geometry, dock cofactors, or filter candidates structurally before ordering genes for wet-lab testing, as demonstrated for heme-binding designs.

Impact

PLAID demonstrates that controllable all-atom protein generation can be driven primarily by abundant sequence data, lowering the structural-data barrier that limits many diffusion approaches. Its function- and organism-conditioned sampling, paired with open 2B and 100M checkpoints and experimental validation, makes it a practical reference point for the growing class of all-atom generative protein models. As a preprint with released weights, its long-term influence will depend on independent benchmarking and peer review, but it contributes a notable design pattern: diffusing in a learned sequence-structure latent rather than over explicit coordinates.

Citation

Controllable All-Atom Protein Generation with Latent Diffusion

Preprint

Lu, A. X., et al. (2026) Controllable All-Atom Protein Generation with Latent Diffusion. bioRxiv.

DOI: 10.1101/2024.12.02.626353

Recent citations

Papers that recently cited this model.

From virtual experiments to biomedical insight with synthetic data
M. Victoriano, Milena Pavlović, G. K. Sandve, et al.
Nature Machine Intelligence · Jun 2026
0
The convergence of AI-driven engineering biology and emerging technologies advancing globally networked autonomous biofoundries.
Ryan R Cochrane, L. V. dos Santos, Yizhi Cai
Current Opinion in Biotechnology · Jun 2026
0
PTM-dCN: Latent Space Control for Post-Translational Modification–Aware Protein Design
Sitao Zhang, Rui Qing, Tianming Huang, et al.
bioRxiv · May 2026
0

Top citations

The most-cited papers that cite this model.

P(all-atom) Is Unlocking New Path For Protein Design
Wei Qu, Jiawei Guan, Rui Ma, et al.
bioRxiv · May 2025
28
Latent-X: An Atom-level Frontier Model for De Novo Protein Binder Design
Latent Labs Team Alex Bridgland, Jonathan Crabb'e, Henry Kenlay, et al.
Jul 2025
12
Illuminating the universe of enzyme catalysis in the era of artificial intelligence.
Jason Yang, Francesca-Zhoufan Li, Yueming Long, et al.
Cell Systems · Aug 2025
9
ProxelGen: Generating Proteins as 3D Densities
Felix Faltings, Hannes Stärk, R. Barzilay, et al.
arXiv.org · Jun 2025
3
Consistent Synthetic Sequences Unlock Structural Diversity in Fully Atomistic De Novo Protein Design
Danny Reidenbach, Zhonglin Cao, Zuobai Zhang, et al.
Dec 2025
2

Citations

Total Citations14

Influential0

References0

GitHub

Stars127

Forks14

Open Issues0

Contributors2

Last Push1y ago

LanguagePython

LicenseMIT

HuggingFace

Downloads0

Likes4

Last Modified1y ago

Fields of citing research

Computer Science92%
Biology85%
Medicine46%
Engineering15%
Environmental Science8%
Chemistry8%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

77Open

Usability — can I run it?90

Reproducibility — can I retrain it?62

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Sequence-only training, all-atom output: PLAID learns from sequence databases alone but generates complete all-atom structures by sampling in the latent space of the ESMFold structure predictor, avoiding the need for solved structures during training.

Compressed latent diffusion: Diffusion is performed over the compact CHEAP autoencoder latent (a separate component) rather than over high-dimensional coordinates, making training and sampling tractable.

Function and organism conditioning: Classifier-free guidance on Gene Ontology (GO) function terms and organism taxonomy lets users steer generation toward desired biological properties.

Experimental validation: The authors report wet-lab characterization of generated heme-binding proteins, providing evidence that conditioning produces functionally relevant designs rather than only plausible structures.

Open weights and code: Both a 2B-parameter and a 100M-parameter checkpoint are released under an MIT license, with the diffusion weights hosted on HuggingFace.

Technical Details

Applications

Impact

Top citations

The most-cited papers that cite this model.

P(all-atom) Is Unlocking New Path For Protein Design

Wei Qu, Jiawei Guan, Rui Ma, et al.

bioRxiv · May 2025

Latent-X: An Atom-level Frontier Model for De Novo Protein Binder Design

Latent Labs Team Alex Bridgland, Jonathan Crabb'e, Henry Kenlay, et al.

Jul 2025

Illuminating the universe of enzyme catalysis in the era of artificial intelligence.

Jason Yang, Francesca-Zhoufan Li, Yueming Long, et al.

Cell Systems · Aug 2025

ProxelGen: Generating Proteins as 3D Densities

Felix Faltings, Hannes Stärk, R. Barzilay, et al.

arXiv.org · Jun 2025

Consistent Synthetic Sequences Unlock Structural Diversity in Fully Atomistic De Novo Protein Design

Danny Reidenbach, Zhonglin Cao, Zuobai Zhang, et al.

Dec 2025

PLAID

#Key Features

#Technical Details

#Applications

#Impact

Citation

Controllable All-Atom Protein Generation with Latent Diffusion

Recent citations

Top citations

Latent-X: An Atom-level Frontier Model for De Novo Protein Binder Design

Consistent Synthetic Sequences Unlock Structural Diversity in Fully Atomistic De Novo Protein Design

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

PLAID

#Key Features

#Technical Details

#Applications

#Impact

Citation

Controllable All-Atom Protein Generation with Latent Diffusion

Recent citations

Top citations

Latent-X: An Atom-level Frontier Model for De Novo Protein Binder Design

Consistent Synthetic Sequences Unlock Structural Diversity in Fully Atomistic De Novo Protein Design

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact