Chinese University of Hong Kong
Rigidity-aware geometric pretraining framework that front-loads SE(3) geometry learning to improve protein backbone generation, motif scaffolding, and conformational ensemble modeling.
RigidSSL is a geometric pretraining framework for protein structure generation that front-loads geometry learning before generative finetuning. Presented at ICLR 2026 and released as a March 2026 bioRxiv preprint by researchers at the Chinese University of Hong Kong and collaborators, it addresses a gap in protein backbone generators: most diffusion and flow-matching models learn geometry implicitly during generative training, which can limit designability and physical realism.
The framework learns from residue-level rigid-body representations in SE(3) space using a two-phase, self-supervised strategy. Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures in the AlphaFold Protein Structure Database with simulated perturbations, and Phase II (RigidSSL-MD) refines those representations on 1.3K molecular dynamics trajectories to capture physically realistic structural transitions. The learned representations then serve as a better starting point for downstream generative protein design.
By treating geometry as an explicit pretraining objective rather than a byproduct of generation, RigidSSL connects to backbone generators such as FrameDiff (on whose codebase it builds, alongside OpenFold) and extends them toward improved designability and conformational modeling.
RigidSSL operates on residue-level rigid-body frames in SE(3), using a bi-directional, rigidity-aware flow-matching objective that jointly models translation and rotation to maximize mutual information between conformations. Pretraining proceeds in two phases: RigidSSL-Perturb learns geometric priors from 432K AlphaFold Protein Structure Database structures with simulated perturbations, and RigidSSL-MD refines representations on 1.3K molecular dynamics trajectories. The implementation builds on the OpenFold and FrameDiff codebases. Empirically, RigidSSL variants improve designability by up to 43%, raise zero-shot motif scaffolding success by 5.8%, and enhance novelty and diversity in unconditional generation while improving biophysical realism in GPCR conformational ensembles. Pretrained checkpoints and processed datasets are available on HuggingFace under an MIT-licensed repository.
RigidSSL benefits protein designers working on de novo backbone generation, motif scaffolding for functional-site grafting, and modeling of conformational ensembles. As a pretraining framework, it can supply improved geometric initialization for downstream generative pipelines, helping produce more designable and diverse backbones. Its demonstrated GPCR conformational modeling is particularly relevant for researchers studying flexible or multi-state proteins where single static structures are insufficient.
RigidSSL contributes a self-supervised, geometry-first perspective to protein backbone generation, showing that explicitly pretraining on rigid-body geometry and MD-derived dynamics can yield substantial designability and scaffolding gains. Its acceptance at ICLR 2026 and release of code, checkpoints, and datasets support reproducibility and downstream adoption. The reliance on a relatively small set of 1.3K MD trajectories for the dynamics phase is a noted scope limitation that future work may expand.