RigidSSL

Self-supervised SE(3) geometric pretraining for protein backbone generators, improving designability, motif scaffolding, and conformational ensembles.

Released: March 2026

RigidSSL is a geometric pretraining framework for protein structure generation that front-loads geometry learning before generative finetuning. Presented at ICLR 2026 and released as a March 2026 bioRxiv preprint by researchers at the Chinese University of Hong Kong and collaborators, it addresses a gap in protein backbone generators: most diffusion and flow-matching models learn geometry implicitly during generative training, which can limit designability and physical realism.

The framework learns from residue-level rigid-body representations in SE(3) space using a two-phase, self-supervised strategy. Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures in the AlphaFold Protein Structure Database with simulated perturbations, and Phase II (RigidSSL-MD) refines those representations on 1.3K molecular dynamics trajectories to capture physically realistic structural transitions. The learned representations then serve as a better starting point for downstream generative protein design.

By treating geometry as an explicit pretraining objective rather than a byproduct of generation, RigidSSL connects to backbone generators such as FrameDiff (on whose codebase it builds, alongside OpenFold) and extends them toward improved designability and conformational modeling.

Key Features

Rigidity-aware flow matching: A bi-directional flow-matching objective jointly optimizes the translational and rotational dynamics of residue-level rigid bodies in SE(3) space to maximize mutual information between conformations.
Two-phase pretraining: Phase I learns geometric priors from 432K AFDB structures with simulated perturbations; Phase II refines them on 1.3K molecular dynamics trajectories for physically realistic transitions.
Improved designability: Reports up to 43% improvement in designability and a 5.8% gain in zero-shot motif scaffolding success rate over baselines.
Conformational ensembles: Captures more biophysically realistic conformational ensembles, demonstrated on GPCR conformational states.
Open implementation: Code is released under the MIT license with pretrained checkpoints and processed datasets on HuggingFace.

Technical Details

RigidSSL operates on residue-level rigid-body frames in SE(3), using a bi-directional, rigidity-aware flow-matching objective that jointly models translation and rotation to maximize mutual information between conformations. Pretraining proceeds in two phases: RigidSSL-Perturb learns geometric priors from 432K AlphaFold Protein Structure Database structures with simulated perturbations, and RigidSSL-MD refines representations on 1.3K molecular dynamics trajectories. The implementation builds on the OpenFold and FrameDiff codebases. Empirically, RigidSSL variants improve designability by up to 43%, raise zero-shot motif scaffolding success by 5.8%, and enhance novelty and diversity in unconditional generation while improving biophysical realism in GPCR conformational ensembles. Pretrained checkpoints and processed datasets are available on HuggingFace under an MIT-licensed repository.

Applications

RigidSSL benefits protein designers working on de novo backbone generation, motif scaffolding for functional-site grafting, and modeling of conformational ensembles. As a pretraining framework, it can supply improved geometric initialization for downstream generative pipelines, helping produce more designable and diverse backbones. Its demonstrated GPCR conformational modeling is particularly relevant for researchers studying flexible or multi-state proteins where single static structures are insufficient.

Impact

RigidSSL contributes a self-supervised, geometry-first perspective to protein backbone generation, showing that explicitly pretraining on rigid-body geometry and MD-derived dynamics can yield substantial designability and scaffolding gains. Its acceptance at ICLR 2026 and release of code, checkpoints, and datasets support reproducibility and downstream adoption. The reliance on a relatively small set of 1.3K MD trajectories for the dynamics phase is a noted scope limitation that future work may expand.

Citation

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Ni, Z., et al. (2026) Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles. bioRxiv.

DOI: 10.64898/2026.03.02.708991

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations182

Influential29

References41

GitHub

Stars19

Forks0

Open Issues0

Contributors1

Last Push4mo ago

LanguagePython

LicenseMIT

HuggingFace

Downloads0

Likes0

Last Modified4mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible

73Open

Usability — can I run it?77

Reproducibility — can I retrain it?76

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper HuggingFace Model Dataset

Key Features

Rigidity-aware flow matching: A bi-directional flow-matching objective jointly optimizes the translational and rotational dynamics of residue-level rigid bodies in SE(3) space to maximize mutual information between conformations.

Two-phase pretraining: Phase I learns geometric priors from 432K AFDB structures with simulated perturbations; Phase II refines them on 1.3K molecular dynamics trajectories for physically realistic transitions.

Improved designability: Reports up to 43% improvement in designability and a 5.8% gain in zero-shot motif scaffolding success rate over baselines.

Conformational ensembles: Captures more biophysically realistic conformational ensembles, demonstrated on GPCR conformational states.

Open implementation: Code is released under the MIT license with pretrained checkpoints and processed datasets on HuggingFace.

Technical Details

Applications

Impact

RigidSSL

Key Features

Technical Details

Applications

Impact

Citation

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

RigidSSL

Key Features

Technical Details

Applications

Impact

Citation

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

RigidSSL

#Key Features

#Technical Details

#Applications

#Impact

Citation

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

RigidSSL

#Key Features

#Technical Details

#Applications

#Impact

Citation

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact