bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

PPLM (Protein-Protein Language Model)

National University of Singapore

A paired-sequence protein language model that jointly encodes interacting proteins to predict interactions, binding affinity, and interface contacts.

Released: March 2026
Parameters: 650 Million

Most protein language models encode one sequence at a time, learning rich representations of individual proteins but treating interaction partners in isolation. PPLM (Protein-Protein Language Model) instead encodes a pair of sequences jointly, so that the representation of one protein is conditioned on its partner. This lets the model learn interaction-aware features—how two chains recognize and bind one another—directly from sequence, rather than inferring them after the fact from two separately computed embeddings.

PPLM was developed by Jun Liu, Hungyu Chen, and Yang Zhang at the Cancer Science Institute of Singapore (CSI Singapore), National University of Singapore. The work first appeared as a bioRxiv preprint in July 2025 ("A Corporative Language Model for Protein-Protein Interaction, Binding Affinity, and Interface Contact Prediction") and was published in Nature Communications on 10 March 2026 as "A paired sequence language model for protein-protein interaction modeling."

Rather than a single black-box predictor, PPLM is a pretrained backbone that ships with three task-specific heads: PPLM-PPI for binary interaction prediction, PPLM-Affinity for binding-strength estimation, and PPLM-Contact for inter-protein interface residue contacts. This positions PPLM as a general-purpose foundation for protein-pair modeling, complementing structure-prediction tools like AlphaFold-Multimer and single-chain models like ESM2.

#Key Features

  • Joint paired-sequence encoding: PPLM processes two protein sequences together, capturing both individual protein features and partner-dependent interaction patterns within a unified framework.
  • Inter-protein attention: For an input pair, PPLM exposes per-sequence embeddings, intra-protein attention matrices for each chain, and an inter-protein attention matrix between the two chains—features that downstream heads consume directly.
  • Three specialized variants: PPLM-PPI predicts whether two proteins interact, PPLM-Affinity estimates binding affinity, and PPLM-Contact identifies interface residue contacts.
  • Strong on hard cases: PPLM-Affinity outperforms both ESM2 and structure-based methods on challenging targets including antibody-antigen and TCR-pMHC complexes.
  • Open weights: All checkpoints (the PPLM backbone plus PPI, Affinity, and Contact heads) are released under the PolyForm Noncommercial License.

#Technical Details

PPLM is initialized from the 650M-parameter ESM2 transformer (the 33-layer esm2_t33_650M model) and continued-pretrained on a corpus of over three million protein pairs, adapting the single-chain backbone into a paired-sequence encoder with inter-protein attention. Pretraining was run on four NVIDIA A100 GPUs for 50,000 steps with a gradient-accumulation factor of 32, using the AdamW optimizer with exponential decay rates β1 = 0.9 and β2 = 0.98. PPLM-Contact additionally incorporates ESM-MSA-derived features. Across benchmark datasets, PPLM-PPI improved interaction-prediction accuracy by up to roughly 17% over leading methods, with consistent gains across multiple species, while PPLM-Affinity surpassed both sequence-based (ESM2) and structure-based baselines on binding-affinity estimation.

#Applications

PPLM targets problems where the unit of interest is a protein pair rather than a single chain. Researchers can use PPLM-PPI for proteome-scale interaction screening and drug-target identification, PPLM-Affinity to rank and prioritize binders during therapeutic and antibody engineering, and PPLM-Contact to map interface residues for guiding mutagenesis or interpreting complex structures. Because the model operates from sequence alone, it is applicable to systems where experimental or predicted structures are unavailable, and the authors note its potential extension to host-pathogen interaction modeling.

#Impact

PPLM advances the case that interaction-aware representations are best learned by encoding partners jointly rather than combining independent single-chain embeddings. By packaging a pretrained backbone with interaction, affinity, and contact heads and releasing the weights, the work gives the protein-modeling community a reusable foundation for protein-pair tasks and a sequence-based complement to structure-prediction pipelines. Its reported gains on antibody-antigen and TCR-pMHC affinity—long-standing hard cases for both sequence and structure methods—are particularly relevant to immunology and biologics discovery. The model is restricted to noncommercial use under the PolyForm Noncommercial License, which may limit some industry adoption.

Citations

DOI: 10.1038/s41467-026-70457-5

Preprint

DOI: 10.1101/2025.07.07.663595

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
27Closed
Usability — can I run it?21
Reproducibility — can I retrain it?18
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

antibodybinding_affinity_predictionfoundation_modelinterface_contact_predictionlanguage_modelprotein_protein_interaction_predictionproteomicsself_supervisedtransformer

Resources

GitHub RepositoryResearch PaperOfficial Website