MoMPNN

BioGeometry / Peking University / Mila / Université de Montréal / HEC Montréal

Protein inverse folding model aligning ProteinMPNN by multi-objective preference optimization to improve developability without losing fold fidelity.

Released: March 2026

MoMPNN is a property-driven protein inverse folding model that designs amino-acid sequences to fold into a target backbone while simultaneously optimizing for developability properties such as solubility, thermostability, and expression. It is produced by ProtAlign, a fine-tuning framework introduced in "Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment" (Hou, Liu, Shi, Liu, Yang, and Tang; ICLR 2026). MoMPNN itself is the fixed, released checkpoint — the "property-driven" behavior is baked in by the training recipe rather than requiring conditioning at inference time.

Standard inverse-folding models like ProteinMPNN maximize native sequence recovery, which optimizes structural fidelity but ignores the biophysical properties that determine whether a designed protein can actually be expressed, purified, and used. ProtAlign addresses this gap by treating design as a multi-objective alignment problem: it fine-tunes a pretrained inverse-folding model to satisfy diverse developability objectives while preserving sequence–structure agreement. Applied to ProteinMPNN, the result is MoMPNN.

Developed by BioGeometry in collaboration with Peking University, Mila – Québec AI Institute, Université de Montréal, and HEC Montréal (Jian Tang's group), the work targets a practical pain point in computational protein engineering: bridging the gap between sequences that score well on recovery benchmarks and sequences that behave well in the wet lab.

Key Features

Multi-objective preference alignment: ProtAlign uses a semi-online Direct Preference Optimization (DPO) strategy with a flexible preference margin to steer the model toward multiple developability objectives at once without collapsing sequence diversity.
In silico preference pairs: Training preferences are constructed automatically from computational property predictors — Protein-Sol (solubility), TemBERTure (thermostability), and ESM-2 pseudo-likelihood (evolutionary plausibility) — rather than from costly experimental labels.
Structural fidelity preserved: Across CATH crystal structures, de novo backbones, and binder design, MoMPNN holds or improves TM-score and RMSD relative to ProteinMPNN while raising developability metrics.
Drop-in compatibility: MoMPNN checkpoints retain the original ProteinMPNN format and load directly into the LigandMPNN inference pipeline, so existing design workflows need no code changes.
Released checkpoints: Multiple property-targeted variants (e.g., solubility, thermostability, evolutionary-plausibility combinations) are provided in the repository under mompnn_paper_checkpoints/.

Technical Details

MoMPNN inherits ProteinMPNN's message-passing graph neural network architecture, which encodes a protein backbone as a graph and autoregressively decodes a sequence. ProtAlign fine-tunes this base model with semi-online DPO: candidate sequences are sampled, scored by the property predictors, paired into preferred and dispreferred examples, and used to update the model with a flexible margin that scales the preference signal. On CATH 4.3 crystal structures, the [Sol+TM+EP] variant reaches a solubility score of 0.790 (vs. 0.769 for ProteinMPNN) and TM-score 79.5 (vs. 79.1) while keeping RMSD near 0.74 Å. On de novo backbone design, the [Sol+IG+EP] variant improves self-consistency TM-score to 0.751 (vs. 0.718), reduces RMSD to 6.17 Å (vs. 6.86 Å), raises pLDDT to 72.0 (vs. 70.0), and lifts solubility to 0.843 (vs. 0.731). On binder design it shows gains in solubility and evolutionary plausibility with comparable or slightly higher success rates.

Applications

MoMPNN is aimed at protein engineers and computational designers who use inverse folding to generate sequences for crystal-structure redesign, de novo proteins from generative backbone models (e.g., RFdiffusion), and de novo binder design. By improving solubility, thermostability, and evolutionary plausibility without sacrificing designability, it is intended to raise the fraction of computational designs that survive expression and purification, reducing wet-lab attrition. Because it slots into the LigandMPNN pipeline, teams already using ProteinMPNN can adopt it with minimal friction.

Impact

MoMPNN demonstrates that preference-alignment techniques from large language models — specifically DPO — transfer effectively to structure-conditioned protein sequence design, offering a general recipe (ProtAlign) for injecting multiple, competing developability objectives into existing inverse-folding models. Released checkpoints and ProteinMPNN/LigandMPNN compatibility lower the barrier to adoption. As a 2026 contribution it is early-stage, and the reported developability gains are measured by in silico predictors rather than experimental assays, so wet-lab validation remains an important next step.

Citation

Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment

Preprint

Hou, X., et al. (2026) Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment.

DOI: 10.48550/arXiv.2603.06748

Recent citations

Papers that recently cited this model.

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design
Yulin Zhang, He Cao, Zihao Jiang, et al.
May 2026
0Influential
Symmetric Self-play Online Preference Optimization for Protein Inverse Folding
Wenwu Zeng, Xiaoyu Li, Haitao Zou, et al.
bioRxiv · Mar 2026
0
Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards
S. Wanasekara, M. Nguyen, Xiaochen Liu, et al.
Mar 2026
0

Top citations

The most-cited papers that cite this model.

Symmetric Self-play Online Preference Optimization for Protein Inverse Folding
Wenwu Zeng, Xiaoyu Li, Haitao Zou, et al.
bioRxiv · Mar 2026
0
Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards
S. Wanasekara, M. Nguyen, Xiaochen Liu, et al.
Mar 2026
0
ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design
Yulin Zhang, He Cao, Zihao Jiang, et al.
May 2026
0Influential

Citations

Total Citations3

Influential1

References66

GitHub

Stars6

Forks0

Open Issues1

Contributors1

Last Push4mo ago

Fields of citing research

Biology100%
Computer Science100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

34Closed

Usability — can I run it?39

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website

Key Features

Multi-objective preference alignment: ProtAlign uses a semi-online Direct Preference Optimization (DPO) strategy with a flexible preference margin to steer the model toward multiple developability objectives at once without collapsing sequence diversity.

In silico preference pairs: Training preferences are constructed automatically from computational property predictors — Protein-Sol (solubility), TemBERTure (thermostability), and ESM-2 pseudo-likelihood (evolutionary plausibility) — rather than from costly experimental labels.

Structural fidelity preserved: Across CATH crystal structures, de novo backbones, and binder design, MoMPNN holds or improves TM-score and RMSD relative to ProteinMPNN while raising developability metrics.

Drop-in compatibility: MoMPNN checkpoints retain the original ProteinMPNN format and load directly into the LigandMPNN inference pipeline, so existing design workflows need no code changes.

Released checkpoints: Multiple property-targeted variants (e.g., solubility, thermostability, evolutionary-plausibility combinations) are provided in the repository under mompnn_paper_checkpoints/.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Yulin Zhang, He Cao, Zihao Jiang, et al.

May 2026

0Influential

Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

Wenwu Zeng, Xiaoyu Li, Haitao Zou, et al.

bioRxiv · Mar 2026

Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards

S. Wanasekara, M. Nguyen, Xiaochen Liu, et al.

Mar 2026

Top citations

The most-cited papers that cite this model.

Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

Wenwu Zeng, Xiaoyu Li, Haitao Zou, et al.

bioRxiv · Mar 2026

Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards

S. Wanasekara, M. Nguyen, Xiaochen Liu, et al.

Mar 2026

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Yulin Zhang, He Cao, Zihao Jiang, et al.

May 2026

0Influential

MoMPNN

#Key Features

#Technical Details

#Applications

#Impact

Citation

Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment

Recent citations

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards

Top citations

Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MoMPNN

#Key Features

#Technical Details

#Applications

#Impact

Citation

Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment

Recent citations

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards

Top citations

Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

Generative Modeling in Protein Design: Neural Representations, Conditional Generation, and Evaluation Standards

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact