bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

SaDiT

Independent Researcher

A protein backbone generator that combines SaProt structural tokenization with a diffusion transformer and an IPA token cache for faster de novo backbone design.

Released: February 2026

SaDiT (SaProt-tokenized Diffusion Transformer) is a generative framework for de novo protein backbone design that aims to make backbone diffusion both faster and more structurally reliable. Many leading backbone generators—such as RFdiffusion and Proteina—operate directly in continuous 3D coordinate space, which couples each denoising step to relatively expensive geometric computation. SaDiT instead represents protein geometry in a discrete latent space using structural tokenization derived from SaProt, then applies a diffusion transformer (DiT) over those tokens, reducing the complexity of the generation process while aiming to preserve SE(3) equivariance.

The method was introduced in February 2026 by Shentong Mo and Lanqing Li as an arXiv preprint. Its central engineering contribution is an IPA Token Cache mechanism that optimizes the Invariant Point Attention (IPA) layers by reusing computed token states across iterative sampling steps, cutting redundant computation during generation. Together, the discrete tokenization and cached IPA are intended to deliver state-of-the-art speed without sacrificing the structural viability of generated backbones.

As a recent preprint, SaDiT reports comparisons against established baselines on standard backbone-generation tasks, but does not yet release public weights or code. Its results should be read as preprint-stage claims pending independent reproduction.

#Key Features

  • Discrete structural tokenization: Protein geometry is encoded into a discrete latent space via SaProt-style structural tokens, simplifying the generative target relative to continuous coordinate diffusion.
  • Diffusion transformer backbone: A DiT architecture denoises over structural tokens, bringing scalable transformer-based diffusion to backbone generation.
  • SE(3) equivariance: The framework is designed to maintain theoretical SE(3) equivariance so generated structures respect rotational and translational symmetry.
  • IPA Token Cache: A caching mechanism reuses computed Invariant Point Attention token states across sampling steps to accelerate iterative generation.
  • Conditional generation: The model supports both unconditional and fold-class conditional backbone generation.

#Technical Details

SaDiT couples a SaProt-derived structural tokenizer with a diffusion transformer that denoises in the discrete token space rather than over raw 3D coordinates. The reported headline contribution is efficiency: the IPA Token Cache reuses Invariant Point Attention states during iterative sampling, and the discrete formulation reduces per-step cost. The authors report that SaDiT outperforms state-of-the-art models including RFdiffusion and Proteina in both computational speed and structural viability across unconditional and fold-class conditional generation, with particular strength on capturing complex topological features. The preprint does not disclose a parameter count, and—at the time of writing—no public weights or code accompany it, so benchmark numbers reflect the authors' own evaluation.

#Applications

SaDiT targets de novo protein backbone design, the first stage of many computational protein-engineering pipelines, where a generated backbone is subsequently sequence-designed (e.g., with an inverse-folding model) and validated. Faster, fold-conditioned backbone sampling is useful for scaffold generation, exploring topological space, and producing diverse candidate structures for downstream design campaigns in research settings.

#Impact

SaDiT contributes to a growing line of work that moves protein backbone diffusion from continuous coordinate space into learned discrete structural-token spaces, trading some geometric directness for transformer scalability and sampling speed. If its reported speed and viability gains over RFdiffusion and Proteina hold up under independent evaluation, the IPA Token Cache and tokenized diffusion design could inform future efficient generators. As a February 2026 preprint without released weights, its real-world adoption and reproducibility remain open questions.

Tags

protein_designde_novo_designdiffusiontransformergenerativeproteomics