National University of Singapore
A paired-sequence protein language model that jointly encodes interacting proteins to predict interactions, binding affinity, and interface contacts.
Most protein language models encode one sequence at a time, learning rich representations of individual proteins but treating interaction partners in isolation. PPLM (Protein-Protein Language Model) instead encodes a pair of sequences jointly, so that the representation of one protein is conditioned on its partner. This lets the model learn interaction-aware features—how two chains recognize and bind one another—directly from sequence, rather than inferring them after the fact from two separately computed embeddings.
PPLM was developed by Jun Liu, Hungyu Chen, and Yang Zhang at the Cancer Science Institute of Singapore (CSI Singapore), National University of Singapore. The work first appeared as a bioRxiv preprint in July 2025 ("A Corporative Language Model for Protein-Protein Interaction, Binding Affinity, and Interface Contact Prediction") and was published in Nature Communications on 10 March 2026 as "A paired sequence language model for protein-protein interaction modeling."
Rather than a single black-box predictor, PPLM is a pretrained backbone that ships with three task-specific heads: PPLM-PPI for binary interaction prediction, PPLM-Affinity for binding-strength estimation, and PPLM-Contact for inter-protein interface residue contacts. This positions PPLM as a general-purpose foundation for protein-pair modeling, complementing structure-prediction tools like AlphaFold-Multimer and single-chain models like ESM2.
PPLM is initialized from the 650M-parameter ESM2 transformer (the 33-layer
esm2_t33_650M model) and continued-pretrained on a corpus of over three
million protein pairs, adapting the single-chain backbone into a paired-sequence
encoder with inter-protein attention. Pretraining was run on four NVIDIA A100
GPUs for 50,000 steps with a gradient-accumulation factor of 32, using the AdamW
optimizer with exponential decay rates β1 = 0.9 and β2 = 0.98. PPLM-Contact
additionally incorporates ESM-MSA-derived features. Across benchmark datasets,
PPLM-PPI improved interaction-prediction accuracy by up to roughly 17% over
leading methods, with consistent gains across multiple species, while
PPLM-Affinity surpassed both sequence-based (ESM2) and structure-based baselines
on binding-affinity estimation.
PPLM targets problems where the unit of interest is a protein pair rather than a single chain. Researchers can use PPLM-PPI for proteome-scale interaction screening and drug-target identification, PPLM-Affinity to rank and prioritize binders during therapeutic and antibody engineering, and PPLM-Contact to map interface residues for guiding mutagenesis or interpreting complex structures. Because the model operates from sequence alone, it is applicable to systems where experimental or predicted structures are unavailable, and the authors note its potential extension to host-pathogen interaction modeling.
PPLM advances the case that interaction-aware representations are best learned by encoding partners jointly rather than combining independent single-chain embeddings. By packaging a pretrained backbone with interaction, affinity, and contact heads and releasing the weights, the work gives the protein-modeling community a reusable foundation for protein-pair tasks and a sequence-based complement to structure-prediction pipelines. Its reported gains on antibody-antigen and TCR-pMHC affinity—long-standing hard cases for both sequence and structure methods—are particularly relevant to immunology and biologics discovery. The model is restricted to noncommercial use under the PolyForm Noncommercial License, which may limit some industry adoption.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data