bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

RadDiff

Nanjing University

A retrieval-augmented discrete denoising diffusion model for protein inverse folding that conditions sequence generation on profiles built from structurally similar proteins.

Released: November 2025

RadDiff addresses protein inverse folding: the task of designing an amino acid sequence that will fold into a given target backbone structure. Inverse folding is a cornerstone of computational protein engineering, underpinning enzyme design, antibody optimization, and the realization of de novo backbones produced by structure-generation tools. While modern methods such as ProteinMPNN, PiFold, and protein-language-model designers have pushed native sequence recovery steadily upward, they typically generate sequences purely from the query structure, leaving the rich evolutionary signal contained in known homologs untapped.

RadDiff, introduced in late 2025 by Jin Han, Tianfan Fu, and Wu-Jun Li at Nanjing University's National Key Laboratory for Novel Software Technology, reframes inverse folding as a retrieval-augmented generation problem. For each target backbone it retrieves structurally similar proteins from large databases, aligns them residue by residue to build a position-specific amino acid profile, and uses that profile as an evolutionary-informed prior to condition a discrete denoising diffusion process. This mirrors the broader retrieval-augmented generation trend in language modeling, transplanting it into structure-based protein design so the generator can lean on observed sequence diversity rather than memorizing it in weights.

The result is a method that the authors report improves sequence recovery by up to 19% over prior approaches across standard benchmarks, while producing highly foldable sequences and scaling gracefully as the retrieval database grows.

#Key Features

  • Retrieval-augmented design: For every query backbone, RadDiff pulls structurally similar proteins from a database and converts them into a position-specific amino acid profile that acts as an evolutionary prior for generation, instead of relying solely on the single input structure.
  • Hierarchical structure search: A two-stage pipeline first coarse-filters candidates with Foldseek's fast 3D-alphabet search and then refines matches with coordinate-based US-align (TM-score), balancing retrieval speed against alignment precision.
  • Discrete denoising diffusion: Sequences are generated by an iterative discrete diffusion process over amino acid tokens, conditioned on both the target structure and the retrieved profile.
  • Lightweight integration module: A compact module injects the retrieved profile into the diffusion backbone, adding the evolutionary prior without retraining or substantially enlarging the core network.
  • Database scalability: Because knowledge lives in the retrieval corpus rather than the parameters, performance improves as more structures become available, letting the model adapt to expanding protein databases.

#Technical Details

RadDiff couples its diffusion generator to a structure encoder built on an equivariant graph neural network (a 6-layer EGNN with hidden dimension 128 and global context vectors), alongside an invariant point attention module for the masked sequence designer. The retrieval stage filters candidates with Foldseek (sequence-identity threshold) and US-align (TM-score > 0.5), then aligns retained hits residue by residue to form the conditioning profile. On standard inverse folding benchmarks the authors report native sequence recovery of roughly 67% on CATH v4.2, about 72% on CATH v4.3, 75.6% on TS50, and 76.2% on PDB2022 — consistently ahead of GNN-based baselines (ProteinMPNN, PiFold, GVP, AlphaDesign), protein-language-model designers (LM-Design, KW-Design), and prior diffusion methods (GraDe-IF, MapDiff), with relative gains up to 19%.

#Applications

RadDiff targets practitioners who need to design sequences for a fixed target fold: enzyme engineers seeking thermostable or activity-tuned variants, antibody and binder designers, and researchers redesigning sequences for de novo backbones generated by structure-design pipelines. Its retrieval-augmented formulation is particularly attractive when close structural homologs exist in the PDB, since the model can directly exploit that evolutionary context to propose foldable, higher-recovery sequences.

#Impact

RadDiff demonstrates that retrieval augmentation — already transformative in language modeling — translates effectively to structure-based protein design, offering a complementary path to ever-larger end-to-end models by externalizing knowledge into a searchable database. The reported recovery gains across CATH, TS50, and PDB2022 are substantial for a maturing benchmark suite. At the preprint stage, however, no model weights or license had been released and only partial code was provided, with the authors stating that a full open-source implementation will follow upon publication; until then, independent reproduction and downstream adoption remain limited.

Citation

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

Preprint

Han, J., et al. (2025) RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding. arXiv.org.

DOI: 10.48550/arXiv.2512.00126

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References48

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
27Closed
Usability — can I run it?25
Reproducibility — can I retrain it?14
Model Openness Framework
Unclassified
Missing required components

Tags

diffusiongenerativegraph_neural_networkinverse_foldingprotein_designretrieval_augmented

Resources

Research Paper