Atom-level generative diffusion model for de novo enzyme design. Scaffolds arbitrary functional group geometries, solving all 41 benchmark active sites vs. 16/41 for prior methods.
RFdiffusion2 is a deep generative model for de novo enzyme design developed by the Baker Lab at the University of Washington Institute for Protein Design, published in Nature Methods in December 2025. It addresses two fundamental bottlenecks of its predecessor, the original RFdiffusion: the need to specify exact sequence indices for each catalytic residue, and the combinatorially expensive enumeration of inverse rotamer solutions for each candidate backbone. RFdiffusion2 eliminates both steps by working directly at the level of heavy atoms rather than residue backbones.
Users provide only the atomic coordinates of key functional groups — for example, the nitrogen of a zinc-coordinating histidine or the oxygen of a catalytic serine — and the model simultaneously infers optimal rotamer conformations and sequence positions for each catalytic residue during generation. This sequence-agnostic, atom-level approach substantially expands the range of active sites the system can design around. In a benchmark of 41 structurally diverse enzyme active sites, RFdiffusion2 generated viable scaffolds for all 41, compared to 16 out of 41 for the prior state-of-the-art — a greater than 2.5-fold improvement.
RFdiffusion2 is built on the RoseTTAFold All-Atom (RFAA) architecture, which represents protein residues, small molecules, and individual heavy atoms in a unified framework. This base network was retrained using Riemannian flow matching: rotational components follow the FrameFlow formulation for SO(3), while translational components use standard Gaussian flow matching. This replaces the approximate rotational loss used in the DDPM-based original RFdiffusion and eliminates the need for self-conditioning or auxiliary loss terms. Training data consisted of experimentally determined protein structures from the Protein Data Bank, with the RFAA backbone pretrained on PDB entries containing both protein chains and small-molecule ligands.
In benchmarking, RFdiffusion2 scaffolded all 41 test active sites. Experimentally, designed zinc-dependent metallohydrolases reached kcat/KM of 16,000 M−1 s−1 in initial screening of 96 sequences, improving to 53,000 M−1 s−1 after one optimization round — values orders of magnitude above previously reported computationally designed metallohydrolases. Active catalysts were confirmed for three distinct reaction mechanisms, each identified within a single 96-well plate of tested sequences.
RFdiffusion2 is intended for researchers aiming to create functional enzymes from scratch without relying on natural enzyme templates. Primary use cases include de novo metalloenzyme design for reactions requiring precise metal-coordination geometry, design of enzymes for reactions with no natural precedent, and active site transplantation — grafting a defined catalytic geometry into a more stable or expressible protein scaffold. The model fits into a standard Baker Lab design pipeline: RFdiffusion2 generates backbone scaffolds, ProteinMPNN designs sequences onto those backbones, and AlphaFold 2 or RoseTTAFold filters designs before experimental characterization.
RFdiffusion2 represents a significant advance in the computational design of functional proteins, extending automated protein design beyond structural targets into the more demanding domain of catalysis. By reducing the experimental screening burden to fewer than 96 sequences per target, it brings de novo enzyme design to a scale accessible to groups without high-throughput robotic infrastructure. The model is open-source under RosettaCommons and was released alongside detailed documentation to support broad adoption. Key limitations include its focus on enzyme active site scaffolding — other design tasks such as binder design or symmetric assemblies remain better served by the original RFdiffusion — and the requirement that input catalytic geometry (typically derived from quantum mechanical transition-state calculations) be available before design begins.
Ahern, W., et al. (2025) Atom-level enzyme active site scaffolding using RFdiffusion2. bioRxiv.
DOI: 10.1038/s41592-025-02975-x