RadDiff addresses protein inverse folding: the task of designing an amino acid sequence that will fold into a given target backbone structure. Inverse folding is a cornerstone of computational protein engineering, underpinning enzyme design, antibody optimization, and the realization of de novo backbones produced by structure-generation tools. While modern methods such as ProteinMPNN, PiFold, and protein-language-model designers have pushed native sequence recovery steadily upward, they typically generate sequences purely from the query structure, leaving the rich evolutionary signal contained in known homologs untapped.

RadDiff, introduced in late 2025 by Jin Han, Tianfan Fu, and Wu-Jun Li at Nanjing University's National Key Laboratory for Novel Software Technology, reframes inverse folding as a retrieval-augmented generation problem. For each target backbone it retrieves structurally similar proteins from large databases, aligns them residue by residue to build a position-specific amino acid profile, and uses that profile as an evolutionary-informed prior to condition a discrete denoising diffusion process. This mirrors the broader retrieval-augmented generation trend in language modeling, transplanting it into structure-based protein design so the generator can lean on observed sequence diversity rather than memorizing it in weights.

The result is a method that the authors report improves sequence recovery by up to 19% over prior approaches across standard benchmarks, while producing highly foldable sequences and scaling gracefully as the retrieval database grows.

Key Features

Retrieval-augmented design: For every query backbone, RadDiff pulls structurally similar proteins from a database and converts them into a position-specific amino acid profile that acts as an evolutionary prior for generation, instead of relying solely on the single input structure.
Hierarchical structure search: A two-stage pipeline first coarse-filters candidates with Foldseek's fast 3D-alphabet search and then refines matches with coordinate-based US-align (TM-score), balancing retrieval speed against alignment precision.
Discrete denoising diffusion: Sequences are generated by an iterative discrete diffusion process over amino acid tokens, conditioned on both the target structure and the retrieved profile.
Lightweight integration module: A compact module injects the retrieved profile into the diffusion backbone, adding the evolutionary prior without retraining or substantially enlarging the core network.
Database scalability: Because knowledge lives in the retrieval corpus rather than the parameters, performance improves as more structures become available, letting the model adapt to expanding protein databases.

Technical Details

RadDiff couples its diffusion generator to a structure encoder built on an equivariant graph neural network (a 6-layer EGNN with hidden dimension 128 and global context vectors), alongside an invariant point attention module for the masked sequence designer. The retrieval stage filters candidates with Foldseek (sequence-identity threshold) and US-align (TM-score > 0.5), then aligns retained hits residue by residue to form the conditioning profile. On standard inverse folding benchmarks the authors report native sequence recovery of roughly 67% on CATH v4.2, about 72% on CATH v4.3, 75.6% on TS50, and 76.2% on PDB2022 — consistently ahead of GNN-based baselines (ProteinMPNN, PiFold, GVP, AlphaDesign), protein-language-model designers (LM-Design, KW-Design), and prior diffusion methods (GraDe-IF, MapDiff), with relative gains up to 19%.

Applications

RadDiff targets practitioners who need to design sequences for a fixed target fold: enzyme engineers seeking thermostable or activity-tuned variants, antibody and binder designers, and researchers redesigning sequences for de novo backbones generated by structure-design pipelines. Its retrieval-augmented formulation is particularly attractive when close structural homologs exist in the PDB, since the model can directly exploit that evolutionary context to propose foldable, higher-recovery sequences.

Impact

RadDiff demonstrates that retrieval augmentation — already transformative in language modeling — translates effectively to structure-based protein design, offering a complementary path to ever-larger end-to-end models by externalizing knowledge into a searchable database. The reported recovery gains across CATH, TS50, and PDB2022 are substantial for a maturing benchmark suite. At the preprint stage, however, no model weights or license had been released and only partial code was provided, with the authors stating that a full open-source implementation will follow upon publication; until then, independent reproduction and downstream adoption remain limited.

Key Features

Retrieval-augmented design: For every query backbone, RadDiff pulls structurally similar proteins from a database and converts them into a position-specific amino acid profile that acts as an evolutionary prior for generation, instead of relying solely on the single input structure.

Hierarchical structure search: A two-stage pipeline first coarse-filters candidates with Foldseek's fast 3D-alphabet search and then refines matches with coordinate-based US-align (TM-score), balancing retrieval speed against alignment precision.

Discrete denoising diffusion: Sequences are generated by an iterative discrete diffusion process over amino acid tokens, conditioned on both the target structure and the retrieved profile.

Lightweight integration module: A compact module injects the retrieved profile into the diffusion backbone, adding the evolutionary prior without retraining or substantially enlarging the core network.

Database scalability: Because knowledge lives in the retrieval corpus rather than the parameters, performance improves as more structures become available, letting the model adapt to expanding protein databases.

Technical Details

Applications

Impact

RadDiff

Key Features

Technical Details

Applications

Impact

Citation

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

RadDiff

Key Features

Technical Details

Applications

Impact

Citation

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

RadDiff

#Key Features

#Technical Details

#Applications

#Impact

Citation

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

RadDiff

#Key Features

#Technical Details

#Applications

#Impact

Citation

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact