FlashPPI

Contrastive model built on a genomic language model that predicts physical protein-protein interactions across a microbial proteome in linear time.

Released: March 2026

FlashPPI predicts physical protein-protein interactions (PPIs) at proteome scale and in linear time. Conventional approaches to proteome-wide interaction screening are either slow, because they evaluate every protein pair, or rely on expensive structure-folding models such as AlphaFold-Multimer that are impractical to run across an entire proteome. FlashPPI reframes PPI screening as a dense retrieval problem: it embeds proteins into a shared latent space where interacting partners lie close together, so candidate interactions can be found by nearest-neighbor search rather than exhaustive pairwise scoring.

The model was developed by Tatta Bio (Andre Cornman, Matt Tranzillo, Nicolo G. Zulaybar, Imane Bouzit, and Yunha Hwang), with a preprint posted to bioRxiv in March 2026. It builds on the team's genomic language modeling work, initializing from a genomic language model (gLM2) that captures cross-protein co-evolutionary signals learned from metagenomic sequences and genomic context.

By grounding its contrastive training in residue-level interaction signals and leveraging metagenomic co-evolution, FlashPPI reports an approximately four-fold improvement over existing sequence-based PPI methods while reducing proteome-wide screening time from days to minutes, at performance comparable to structure-folding models at a fraction of the cost.

Key Features

Linear-time proteome screening: Casts PPI prediction as dense retrieval in a shared embedding space, avoiding quadratic all-pairs scoring and cutting proteome-wide screens from days to minutes.
Genomic language model backbone: Initializes from gLM2, capturing cross-protein co-evolutionary signals learned from metagenomic sequences and genomic context.
Residue-level contrastive training: Grounds the contrastive objective in residue-level interaction signals to align physically interacting partners.
Strong accuracy at low cost: Reports roughly four-fold improvement over sequence-based baselines and screening accuracy comparable to structure-folding models at far lower compute.

Technical Details

FlashPPI is a contrastive learning framework that maps proteins into a shared latent space in which interacting partners are aligned, so that physical interfaces across a microbial proteome can be identified by retrieval. It initializes from the gLM2 genomic language model, which is trained on metagenomic sequences and learns cross-protein co-evolutionary signals from genomic context; the contrastive objective is grounded in residue-level interaction information. Reported evaluations show an approximately four-fold gain over existing sequence-based PPI prediction methods, with screening performance comparable to state-of-the-art structure-folding approaches at a fraction of the computational cost. Code is released on GitHub under CC BY-NC 4.0, weights are available on Hugging Face, and the model is deployed in an interactive web platform.

Applications

FlashPPI targets microbial discovery and functional genomics, enabling researchers to build proteome-wide interaction networks rapidly and to prioritize candidate physical interactions for follow-up. It is deployed at seqhub.org, an interactive platform that combines predicted networks with functional annotations and genomic context, making proteome-scale network analysis accessible to microbiologists and metagenomics researchers without large compute budgets.

Impact

FlashPPI shows that genomic language models trained on metagenomic co-evolution can power scalable, structure-free interaction prediction, bringing proteome-wide PPI screening within reach of routine analysis. By pairing competitive accuracy with linear-time inference and an interactive deployment, it expands the practical toolkit for microbial interactome mapping. As a recent preprint released under a non-commercial license with open code and weights, its broader influence will depend on independent benchmarking, but it offers a compelling efficiency-versus-accuracy tradeoff relative to folding-based screens.

Citation

Linear-time prediction of proteome-scale microbial protein interactions

Cornman, A., et al. (2026) Linear-time prediction of proteome-scale microbial protein interactions. bioRxiv.

DOI: 10.64898/2026.03.01.708874

Recent citations

Papers that recently cited this model.

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning
Tanya Hochner-Vilk, Doris J. Stein, O. Schueler‐Furman, et al.
bioRxiv · May 2026
0

Top citations

The most-cited papers that cite this model.

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning
Tanya Hochner-Vilk, Doris J. Stein, O. Schueler‐Furman, et al.
bioRxiv · May 2026
0

Citations

Total Citations1

GitHub

Stars39

Forks2

HuggingFace

Downloads43K

Likes1

Fields of citing research

Biology100%
Computer Science100%
Medicine100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

14Closed

Usability — can I run it?12

Reproducibility — can I retrain it?12

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model Demo

Key Features

Linear-time proteome screening: Casts PPI prediction as dense retrieval in a shared embedding space, avoiding quadratic all-pairs scoring and cutting proteome-wide screens from days to minutes.

Genomic language model backbone: Initializes from gLM2, capturing cross-protein co-evolutionary signals learned from metagenomic sequences and genomic context.

Residue-level contrastive training: Grounds the contrastive objective in residue-level interaction signals to align physically interacting partners.

Strong accuracy at low cost: Reports roughly four-fold improvement over sequence-based baselines and screening accuracy comparable to structure-folding models at far lower compute.

Technical Details

Applications

Impact

FlashPPI

Key Features

Technical Details

Applications

Impact

Citation

Linear-time prediction of proteome-scale microbial protein interactions

Recent citations

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning

Top citations

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

FlashPPI

Key Features

Technical Details

Applications

Impact

Citation

Linear-time prediction of proteome-scale microbial protein interactions

Recent citations

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning

Top citations

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

FlashPPI

#Key Features

#Technical Details

#Applications

#Impact

Citation

Linear-time prediction of proteome-scale microbial protein interactions

Recent citations

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning

Top citations

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

FlashPPI

#Key Features

#Technical Details

#Applications

#Impact

Citation

Linear-time prediction of proteome-scale microbial protein interactions

Recent citations

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning

Top citations

CliPepPI: Scalable prediction of domain-peptide specificity using contrastive learning

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact