Diffusion model for de novo protein design that generates novel backbone structures conditioned on binding targets, symmetry constraints, and functional motifs.
RFdiffusion is a generative diffusion model for de novo protein design, developed by the Baker Lab at the University of Washington Institute for Protein Design. Published in Nature in July 2023, it generates novel protein backbone structures by iteratively denoising from random Gaussian noise, guided by a fine-tuned version of RoseTTAFold. Unlike earlier protein design methods that required explicit physical energy functions, RFdiffusion learns the distribution of protein structures directly from the Protein Data Bank and produces diverse, designable backbones on demand — from small monomers to symmetric assemblies to precisely scaffolded functional sites.
What distinguishes RFdiffusion from prior generative approaches is the breadth of conditioning it supports. A single model family handles unconditional monomer generation, binding partner-guided binder design, symmetric oligomer design, and functional motif scaffolding, all within a unified denoising diffusion framework. This versatility, combined with experimental success rates that substantially exceed prior computational methods, made RFdiffusion the most-cited protein design paper of 2023.
RFdiffusion is used in conjunction with ProteinMPNN for sequence design and AlphaFold2 or ESMFold for computational validation, forming a widely adopted three-stage pipeline for end-to-end protein design.
RFdiffusion adapts the denoising diffusion probabilistic model (DDPM) framework to protein structure space. The core denoising network is derived from RoseTTAFold and fine-tuned to reverse a forward diffusion process that adds Gaussian noise to residue translations and rotations. Protein conformations are represented as rigid-body frames (one rotation matrix and translation vector per residue) using SE(3)-equivariant representations following the FrameDiff formalism, making the model geometrically consistent under rotation and translation.
The reverse diffusion process runs for approximately 200 timesteps. Conditioning information — binding site coordinates, symmetry operators, or motif residue frames — is incorporated as additional input to the denoising network at each step, guiding the generative trajectory without requiring separate model variants for each task. Self-conditioning, where the model attends to its own previous structure prediction during denoising, further improves coherence and designability. Specialized fine-tuning runs were performed on protein-protein complexes, homomeric assemblies, and motif-containing structures to optimize performance on each design task. Designability is assessed by the self-consistency TM-score (scTM): sequences are designed by ProteinMPNN and structures predicted by AlphaFold2, with high scTM indicating that the generated backbone robustly encodes a unique fold.
RFdiffusion addresses a broad range of protein engineering challenges across therapeutic, industrial, and basic research contexts. Drug discovery groups use it to engineer high-affinity binders to cytokine receptors, growth factor receptors, and other drug targets. Vaccinologists use the motif-scaffolding capability to display neutralizing epitopes from viruses such as RSV and influenza on stable, immunogenic platforms. Enzyme engineers place catalytic residues into stable, expressible scaffolds for industrial and research biocatalysis. Structural biologists and protein nanotechnologists design symmetric protein cages and filaments for drug delivery vehicles and biomaterials. The model integrates naturally into automated design pipelines: RFdiffusion generates backbones, ProteinMPNN designs sequences, and AlphaFold2 validates the predicted fold before wet-lab synthesis.
RFdiffusion established diffusion models as the leading paradigm for generative protein design and its 2023 Nature paper became the most-cited protein design publication of that year. The model is fully open-source under a permissive license, and the GitHub repository has seen widespread adoption in both academic and industrial settings. Its success directly inspired a subsequent generation of structure-conditioned generative models and extensions, including RFdiffusion All-Atom (which incorporates small molecules and nucleic acids) and FrameDiff-based approaches from other groups. A key limitation is that RFdiffusion generates backbone geometry only; sequence design remains a separate step requiring ProteinMPNN or a similar tool, and experimental validation is always needed since computational designability metrics (scTM, pLDDT) are predictive but not definitive. Very large receptor inputs increase inference time considerably, and scaffolding of structurally complex or very large motifs remains challenging.
Watson, J. L., et al. (2023) De novo design of protein structure and function with RFdiffusion. Nature.
DOI: 10.1038/s41586-023-06415-8