All-atom diffusion model for de novo protein design conditioned on ligands, nucleic acids, and arbitrary non-protein atoms, enabling enzyme and DNA binder design.
RFdiffusion3 (RFD3) is the third-generation protein design diffusion model from the Baker Lab at the University of Washington Institute for Protein Design, released in December 2025. It introduces all-atom modeling as the fundamental architectural innovation: rather than diffusing over protein backbone frames alone, RFD3 treats every atom in a biomolecular system as a first-class citizen in the generative process. This allows the model to design proteins conditioned on ligands, nucleic acids, and arbitrary non-protein atoms simultaneously — a capability that prior backbone-only diffusion models could not achieve natively.
The preprint "De novo Design of All-atom Biomolecular Interactions with RFdiffusion3" was posted to bioRxiv in September 2025, and the code and weights were made publicly available through the RosettaCommons Foundry repository in December 2025. RFD3 shares no code with its predecessors RFdiffusion and RFdiffusion2; it is a complete architectural rebuild designed around atom-level representations of multi-molecular systems.
The model achieves these capabilities at approximately one-tenth the inference cost of predecessor models, lowering the barrier to applying diffusion-based design to challenging multi-constraint problems involving enzyme active sites, DNA recognition interfaces, and small-molecule binding pockets.
RFdiffusion3 is a 168-million parameter transformer-based U-Net that operates directly on atomic coordinates. Each residue is represented with 4 backbone atoms and up to 10 side-chain atoms; shorter side chains are padded with virtual atoms at the Cbeta position to maintain a uniform representation. Attention is restricted to geometrically adjacent atoms rather than all pairs, concentrating computation where it is physically meaningful. The Pairformer module from AlphaFold 3 is reduced from 48 layers to 2 layers, and triangle multiplicative updates and triangle attention are omitted, yielding the order-of-magnitude speed improvement.
The model was trained on a hierarchical schedule using two data sources: all available Protein Data Bank complexes spanning protein-protein, protein-small molecule, protein-DNA, and protein-RNA interactions, supplemented by AlphaFold 2 self-distillation structures to broaden sequence space coverage. Training ran on 16 NVIDIA H200 GPUs for approximately seven days. Benchmarks show RFD3 outperforming RFdiffusion (v1) on 4 of 5 protein-protein binder targets. Experimentally, 18% of designed cysteine hydrolase scaffolds showed multi-turnover catalytic activity, with the best design achieving kcat/Km of 3,557 +/- 624 M-1s-1.
RFdiffusion3 substantially expands the range of problems addressable by diffusion-based protein design. Researchers can scaffold catalytic triads and other active-site geometries into stable protein frameworks for de novo enzyme design without natural enzyme templates. DNA-binding proteins targeting defined sequences are relevant to gene regulation, epigenetic editing, and synthetic biology. Small-molecule binding proteins can serve as biosensors or drug development starting points. The model's unified treatment of molecular heterogeneity makes it particularly well-suited to multi-constraint problems, such as designing a protein that simultaneously scaffolds a catalytic residue and binds a cofactor — tasks where backbone-only approaches require separate, sequential design stages. Sequence design remains a separate downstream step using tools such as ProteinMPNN or LigandMPNN.
RFdiffusion3 represents a meaningful advance in the field of computational protein design by bringing all-atom awareness to the generative diffusion framework. It is the first model in the RFdiffusion lineage to natively handle multi-molecular systems at atomic resolution, extending diffusion-based design beyond backbone scaffolding into the realm of functional site engineering. The model is open-source under a permissive license and distributed through RosettaCommons Foundry alongside training code, supporting community extension. As of its release in December 2025, the underlying paper is a bioRxiv preprint and has not yet undergone formal peer review, and experimental validation covers two design challenges (DNA binders and cysteine hydrolases); performance on other target classes requires independent characterization. The work is part of a broader trend toward all-atom generative models in structural biology, complementing AlphaFold 3 and Boltz-1 in the prediction space.
Butcher, J., et al. (2025) De novo Design of All-atom Biomolecular Interactions with RFdiffusion3. bioRxiv.
DOI: 10.1101/2025.09.18.676967