A protein backbone generator that combines SaProt structural tokenization with a diffusion transformer and an IPA token cache for faster de novo backbone design.
SaDiT (SaProt-tokenized Diffusion Transformer) is a generative framework for de novo protein backbone design that aims to make backbone diffusion both faster and more structurally reliable. Many leading backbone generators—such as RFdiffusion and Proteina—operate directly in continuous 3D coordinate space, which couples each denoising step to relatively expensive geometric computation. SaDiT instead represents protein geometry in a discrete latent space using structural tokenization derived from SaProt, then applies a diffusion transformer (DiT) over those tokens, reducing the complexity of the generation process while aiming to preserve SE(3) equivariance.
The method was introduced in February 2026 by Shentong Mo and Lanqing Li as an arXiv preprint. Its central engineering contribution is an IPA Token Cache mechanism that optimizes the Invariant Point Attention (IPA) layers by reusing computed token states across iterative sampling steps, cutting redundant computation during generation. Together, the discrete tokenization and cached IPA are intended to deliver state-of-the-art speed without sacrificing the structural viability of generated backbones.
As a recent preprint, SaDiT reports comparisons against established baselines on standard backbone-generation tasks, but does not yet release public weights or code. Its results should be read as preprint-stage claims pending independent reproduction.
SaDiT couples a SaProt-derived structural tokenizer with a diffusion transformer that denoises in the discrete token space rather than over raw 3D coordinates. The reported headline contribution is efficiency: the IPA Token Cache reuses Invariant Point Attention states during iterative sampling, and the discrete formulation reduces per-step cost. The authors report that SaDiT outperforms state-of-the-art models including RFdiffusion and Proteina in both computational speed and structural viability across unconditional and fold-class conditional generation, with particular strength on capturing complex topological features. The preprint does not disclose a parameter count, and—at the time of writing—no public weights or code accompany it, so benchmark numbers reflect the authors' own evaluation.
SaDiT targets de novo protein backbone design, the first stage of many computational protein-engineering pipelines, where a generated backbone is subsequently sequence-designed (e.g., with an inverse-folding model) and validated. Faster, fold-conditioned backbone sampling is useful for scaffold generation, exploring topological space, and producing diverse candidate structures for downstream design campaigns in research settings.
SaDiT contributes to a growing line of work that moves protein backbone diffusion from continuous coordinate space into learned discrete structural-token spaces, trading some geometric directness for transformer scalability and sampling speed. If its reported speed and viability gains over RFdiffusion and Proteina hold up under independent evaluation, the IPA Token Cache and tokenized diffusion design could inform future efficient generators. As a February 2026 preprint without released weights, its real-world adoption and reproducibility remain open questions.