bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

MoMPNN

BioGeometry / Peking University / Mila / Université de Montréal / HEC Montréal

Property-driven protein inverse folding: a ProteinMPNN checkpoint aligned via multi-objective preference optimization to improve developability while preserving structural fidelity.

Released: March 2026

MoMPNN is a property-driven protein inverse folding model that designs amino-acid sequences to fold into a target backbone while simultaneously optimizing for developability properties such as solubility, thermostability, and expression. It is produced by ProtAlign, a fine-tuning framework introduced in "Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment" (Hou, Liu, Shi, Liu, Yang, and Tang; ICLR 2026). MoMPNN itself is the fixed, released checkpoint — the "property-driven" behavior is baked in by the training recipe rather than requiring conditioning at inference time.

Standard inverse-folding models like ProteinMPNN maximize native sequence recovery, which optimizes structural fidelity but ignores the biophysical properties that determine whether a designed protein can actually be expressed, purified, and used. ProtAlign addresses this gap by treating design as a multi-objective alignment problem: it fine-tunes a pretrained inverse-folding model to satisfy diverse developability objectives while preserving sequence–structure agreement. Applied to ProteinMPNN, the result is MoMPNN.

Developed by BioGeometry in collaboration with Peking University, Mila – Québec AI Institute, Université de Montréal, and HEC Montréal (Jian Tang's group), the work targets a practical pain point in computational protein engineering: bridging the gap between sequences that score well on recovery benchmarks and sequences that behave well in the wet lab.

#Key Features

  • Multi-objective preference alignment: ProtAlign uses a semi-online Direct Preference Optimization (DPO) strategy with a flexible preference margin to steer the model toward multiple developability objectives at once without collapsing sequence diversity.
  • In silico preference pairs: Training preferences are constructed automatically from computational property predictors — Protein-Sol (solubility), TemBERTure (thermostability), and ESM-2 pseudo-likelihood (evolutionary plausibility) — rather than from costly experimental labels.
  • Structural fidelity preserved: Across CATH crystal structures, de novo backbones, and binder design, MoMPNN holds or improves TM-score and RMSD relative to ProteinMPNN while raising developability metrics.
  • Drop-in compatibility: MoMPNN checkpoints retain the original ProteinMPNN format and load directly into the LigandMPNN inference pipeline, so existing design workflows need no code changes.
  • Released checkpoints: Multiple property-targeted variants (e.g., solubility, thermostability, evolutionary-plausibility combinations) are provided in the repository under mompnn_paper_checkpoints/.

#Technical Details

MoMPNN inherits ProteinMPNN's message-passing graph neural network architecture, which encodes a protein backbone as a graph and autoregressively decodes a sequence. ProtAlign fine-tunes this base model with semi-online DPO: candidate sequences are sampled, scored by the property predictors, paired into preferred and dispreferred examples, and used to update the model with a flexible margin that scales the preference signal. On CATH 4.3 crystal structures, the [Sol+TM+EP] variant reaches a solubility score of 0.790 (vs. 0.769 for ProteinMPNN) and TM-score 79.5 (vs. 79.1) while keeping RMSD near 0.74 Å. On de novo backbone design, the [Sol+IG+EP] variant improves self-consistency TM-score to 0.751 (vs. 0.718), reduces RMSD to 6.17 Å (vs. 6.86 Å), raises pLDDT to 72.0 (vs. 70.0), and lifts solubility to 0.843 (vs. 0.731). On binder design it shows gains in solubility and evolutionary plausibility with comparable or slightly higher success rates.

#Applications

MoMPNN is aimed at protein engineers and computational designers who use inverse folding to generate sequences for crystal-structure redesign, de novo proteins from generative backbone models (e.g., RFdiffusion), and de novo binder design. By improving solubility, thermostability, and evolutionary plausibility without sacrificing designability, it is intended to raise the fraction of computational designs that survive expression and purification, reducing wet-lab attrition. Because it slots into the LigandMPNN pipeline, teams already using ProteinMPNN can adopt it with minimal friction.

#Impact

MoMPNN demonstrates that preference-alignment techniques from large language models — specifically DPO — transfer effectively to structure-conditioned protein sequence design, offering a general recipe (ProtAlign) for injecting multiple, competing developability objectives into existing inverse-folding models. Released checkpoints and ProteinMPNN/LigandMPNN compatibility lower the barrier to adoption. As a 2026 contribution it is early-stage, and the reported developability gains are measured by in silico predictors rather than experimental assays, so wet-lab validation remains an important next step.

Tags

inverse_foldingprotein_designprotein_sequence_designgraph_neural_networkmessage_passing_neural_networkpreference_alignmentdirect_preference_optimizationmulti_objectivedevelopabilitybinder_design