PPLM (Protein-Protein Language Model)

Paired-sequence protein language model that jointly encodes two interacting chains to predict interactions, binding affinity, and interface contacts.

Released: March 2026

Parameters: 650 Million

Most protein language models encode one sequence at a time, learning rich representations of individual proteins but treating interaction partners in isolation. PPLM (Protein-Protein Language Model) instead encodes a pair of sequences jointly, so that the representation of one protein is conditioned on its partner. This lets the model learn interaction-aware features—how two chains recognize and bind one another—directly from sequence, rather than inferring them after the fact from two separately computed embeddings.

PPLM was developed by Jun Liu, Hungyu Chen, and Yang Zhang at the Cancer Science Institute of Singapore (CSI Singapore), National University of Singapore. The work first appeared as a bioRxiv preprint in July 2025 ("A Corporative Language Model for Protein-Protein Interaction, Binding Affinity, and Interface Contact Prediction") and was published in Nature Communications on 10 March 2026 as "A paired sequence language model for protein-protein interaction modeling."

Rather than a single black-box predictor, PPLM is a pretrained backbone that ships with three task-specific heads: PPLM-PPI for binary interaction prediction, PPLM-Affinity for binding-strength estimation, and PPLM-Contact for inter-protein interface residue contacts. This positions PPLM as a general-purpose foundation for protein-pair modeling, complementing structure-prediction tools like AlphaFold-Multimer and single-chain models like ESM2.

Key Features

Joint paired-sequence encoding: PPLM processes two protein sequences together, capturing both individual protein features and partner-dependent interaction patterns within a unified framework.
Inter-protein attention: For an input pair, PPLM exposes per-sequence embeddings, intra-protein attention matrices for each chain, and an inter-protein attention matrix between the two chains—features that downstream heads consume directly.
Three specialized variants: PPLM-PPI predicts whether two proteins interact, PPLM-Affinity estimates binding affinity, and PPLM-Contact identifies interface residue contacts.
Strong on hard cases: PPLM-Affinity outperforms both ESM2 and structure-based methods on challenging targets including antibody-antigen and TCR-pMHC complexes.
Open weights: All checkpoints (the PPLM backbone plus PPI, Affinity, and Contact heads) are released under the PolyForm Noncommercial License.

Technical Details

PPLM is initialized from the 650M-parameter ESM2 transformer (the 33-layer esm2_t33_650M model) and continued-pretrained on a corpus of over three million protein pairs, adapting the single-chain backbone into a paired-sequence encoder with inter-protein attention. Pretraining was run on four NVIDIA A100 GPUs for 50,000 steps with a gradient-accumulation factor of 32, using the AdamW optimizer with exponential decay rates β1 = 0.9 and β2 = 0.98. PPLM-Contact additionally incorporates ESM-MSA-derived features. Across benchmark datasets, PPLM-PPI improved interaction-prediction accuracy by up to roughly 17% over leading methods, with consistent gains across multiple species, while PPLM-Affinity surpassed both sequence-based (ESM2) and structure-based baselines on binding-affinity estimation.

Applications

PPLM targets problems where the unit of interest is a protein pair rather than a single chain. Researchers can use PPLM-PPI for proteome-scale interaction screening and drug-target identification, PPLM-Affinity to rank and prioritize binders during therapeutic and antibody engineering, and PPLM-Contact to map interface residues for guiding mutagenesis or interpreting complex structures. Because the model operates from sequence alone, it is applicable to systems where experimental or predicted structures are unavailable, and the authors note its potential extension to host-pathogen interaction modeling.

Impact

PPLM advances the case that interaction-aware representations are best learned by encoding partners jointly rather than combining independent single-chain embeddings. By packaging a pretrained backbone with interaction, affinity, and contact heads and releasing the weights, the work gives the protein-modeling community a reusable foundation for protein-pair tasks and a sequence-based complement to structure-prediction pipelines. Its reported gains on antibody-antigen and TCR-pMHC affinity—long-standing hard cases for both sequence and structure methods—are particularly relevant to immunology and biologics discovery. The model is restricted to noncommercial use under the PolyForm Noncommercial License, which may limit some industry adoption.

Citations

A Corporative Language Model for Protein–Protein Interaction, Binding Affinity, and Interface Contact Prediction

Preprint

Liu, J., et al. (2025) A Corporative Language Model for Protein–Protein Interaction, Binding Affinity, and Interface Contact Prediction. bioRxiv.

DOI: 10.1101/2025.07.07.663595

A paired sequence language model for protein-protein interaction modeling

Liu, J., et al. (2026) A paired sequence language model for protein-protein interaction modeling. Nature Communications.

DOI: 10.1038/s41467-026-70457-5

Recent citations

Papers that recently cited this model.

Ophiuchus-Ab: A Versatile Generative Foundation Model for Advanced Antibody-Based Immunotherapy
Yiheng Zhu, Jian Ma, Mingze Yin, et al.
bioRxiv · Feb 2026
0
SLAE: Strictly Local All-atom Environment for Protein Representation
Yilin Chen, Tianyu Lu, Cizhang Zhao, et al.
bioRxiv · Oct 2025
0Influential

Top citations

The most-cited papers that cite this model.

Ophiuchus-Ab: A Versatile Generative Foundation Model for Advanced Antibody-Based Immunotherapy
Yiheng Zhu, Jian Ma, Mingze Yin, et al.
bioRxiv · Feb 2026
0
SLAE: Strictly Local All-atom Environment for Protein Representation
Yilin Chen, Tianyu Lu, Cizhang Zhao, et al.
bioRxiv · Oct 2025
0Influential

Citations

Total Citations2

Influential1

References44

GitHub

Stars62

Forks5

Open Issues4

Contributors1

Last Push7d ago

LanguagePython

Fields of citing research

Biology100%
Computer Science100%
Medicine100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

27Closed

Usability — can I run it?21

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website

Key Features

Joint paired-sequence encoding: PPLM processes two protein sequences together, capturing both individual protein features and partner-dependent interaction patterns within a unified framework.

Inter-protein attention: For an input pair, PPLM exposes per-sequence embeddings, intra-protein attention matrices for each chain, and an inter-protein attention matrix between the two chains—features that downstream heads consume directly.

Three specialized variants: PPLM-PPI predicts whether two proteins interact, PPLM-Affinity estimates binding affinity, and PPLM-Contact identifies interface residue contacts.

Strong on hard cases: PPLM-Affinity outperforms both ESM2 and structure-based methods on challenging targets including antibody-antigen and TCR-pMHC complexes.

Open weights: All checkpoints (the PPLM backbone plus PPI, Affinity, and Contact heads) are released under the PolyForm Noncommercial License.

Technical Details

Applications

Impact

Citations

A Corporative Language Model for Protein–Protein Interaction, Binding Affinity, and Interface Contact Prediction

Preprint

Liu, J., et al. (2025) A Corporative Language Model for Protein–Protein Interaction, Binding Affinity, and Interface Contact Prediction. bioRxiv.

DOI: 10.1101/2025.07.07.663595

A paired sequence language model for protein-protein interaction modeling

Liu, J., et al. (2026) A paired sequence language model for protein-protein interaction modeling. Nature Communications.

DOI: 10.1038/s41467-026-70457-5

PPLM (Protein-Protein Language Model)

Key Features

Technical Details

Applications

Impact

Citations

A Corporative Language Model for Protein–Protein Interaction, Binding Affinity, and Interface Contact Prediction

A paired sequence language model for protein-protein interaction modeling

Recent citations

Ophiuchus-Ab: A Versatile Generative Foundation Model for Advanced Antibody-Based Immunotherapy

SLAE: Strictly Local All-atom Environment for Protein Representation

Top citations

Ophiuchus-Ab: A Versatile Generative Foundation Model for Advanced Antibody-Based Immunotherapy

SLAE: Strictly Local All-atom Environment for Protein Representation

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PPLM (Protein-Protein Language Model)

Key Features

Technical Details

Applications

Impact

Citations

A Corporative Language Model for Protein–Protein Interaction, Binding Affinity, and Interface Contact Prediction

A paired sequence language model for protein-protein interaction modeling

Recent citations

Ophiuchus-Ab: A Versatile Generative Foundation Model for Advanced Antibody-Based Immunotherapy

SLAE: Strictly Local All-atom Environment for Protein Representation

Top citations

Ophiuchus-Ab: A Versatile Generative Foundation Model for Advanced Antibody-Based Immunotherapy

SLAE: Strictly Local All-atom Environment for Protein Representation

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PPLM (Protein-Protein Language Model)

#Key Features

#Technical Details

#Applications

#Impact

Citations

A Corporative Language Model for Protein–Protein Interaction, Binding Affinity, and Interface Contact Prediction

A paired sequence language model for protein-protein interaction modeling

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PPLM (Protein-Protein Language Model)

#Key Features

#Technical Details

#Applications

#Impact

Citations

A Corporative Language Model for Protein–Protein Interaction, Binding Affinity, and Interface Contact Prediction

A paired sequence language model for protein-protein interaction modeling

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact