bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

FlashPPI

Tatta Bio

Contrastive model built on a genomic language model that predicts physical protein-protein interactions across a microbial proteome in linear time.

Released: March 2026

FlashPPI predicts physical protein-protein interactions (PPIs) at proteome scale and in linear time. Conventional approaches to proteome-wide interaction screening are either slow, because they evaluate every protein pair, or rely on expensive structure-folding models such as AlphaFold-Multimer that are impractical to run across an entire proteome. FlashPPI reframes PPI screening as a dense retrieval problem: it embeds proteins into a shared latent space where interacting partners lie close together, so candidate interactions can be found by nearest-neighbor search rather than exhaustive pairwise scoring.

The model was developed by Tatta Bio (Andre Cornman, Matt Tranzillo, Nicolo G. Zulaybar, Imane Bouzit, and Yunha Hwang), with a preprint posted to bioRxiv in March 2026. It builds on the team's genomic language modeling work, initializing from a genomic language model (gLM2) that captures cross-protein co-evolutionary signals learned from metagenomic sequences and genomic context.

By grounding its contrastive training in residue-level interaction signals and leveraging metagenomic co-evolution, FlashPPI reports an approximately four-fold improvement over existing sequence-based PPI methods while reducing proteome-wide screening time from days to minutes, at performance comparable to structure-folding models at a fraction of the cost.

#Key Features

  • Linear-time proteome screening: Casts PPI prediction as dense retrieval in a shared embedding space, avoiding quadratic all-pairs scoring and cutting proteome-wide screens from days to minutes.
  • Genomic language model backbone: Initializes from gLM2, capturing cross-protein co-evolutionary signals learned from metagenomic sequences and genomic context.
  • Residue-level contrastive training: Grounds the contrastive objective in residue-level interaction signals to align physically interacting partners.
  • Strong accuracy at low cost: Reports roughly four-fold improvement over sequence-based baselines and screening accuracy comparable to structure-folding models at far lower compute.

#Technical Details

FlashPPI is a contrastive learning framework that maps proteins into a shared latent space in which interacting partners are aligned, so that physical interfaces across a microbial proteome can be identified by retrieval. It initializes from the gLM2 genomic language model, which is trained on metagenomic sequences and learns cross-protein co-evolutionary signals from genomic context; the contrastive objective is grounded in residue-level interaction information. Reported evaluations show an approximately four-fold gain over existing sequence-based PPI prediction methods, with screening performance comparable to state-of-the-art structure-folding approaches at a fraction of the computational cost. Code is released on GitHub under CC BY-NC 4.0, weights are available on Hugging Face, and the model is deployed in an interactive web platform.

#Applications

FlashPPI targets microbial discovery and functional genomics, enabling researchers to build proteome-wide interaction networks rapidly and to prioritize candidate physical interactions for follow-up. It is deployed at seqhub.org, an interactive platform that combines predicted networks with functional annotations and genomic context, making proteome-scale network analysis accessible to microbiologists and metagenomics researchers without large compute budgets.

#Impact

FlashPPI shows that genomic language models trained on metagenomic co-evolution can power scalable, structure-free interaction prediction, bringing proteome-wide PPI screening within reach of routine analysis. By pairing competitive accuracy with linear-time inference and an interactive deployment, it expands the practical toolkit for microbial interactome mapping. As a recent preprint released under a non-commercial license with open code and weights, its broader influence will depend on independent benchmarking, but it offers a compelling efficiency-versus-accuracy tradeoff relative to folding-based screens.

Tags

protein_protein_interaction_predictioninteraction_network_inferencetransformercontrastive_learningrepresentation_learningmetagenomicsmicrobial_proteins