Contrastive model built on a genomic language model that predicts physical protein-protein interactions across a microbial proteome in linear time.
FlashPPI predicts physical protein-protein interactions (PPIs) at proteome scale and in linear time. Conventional approaches to proteome-wide interaction screening are either slow, because they evaluate every protein pair, or rely on expensive structure-folding models such as AlphaFold-Multimer that are impractical to run across an entire proteome. FlashPPI reframes PPI screening as a dense retrieval problem: it embeds proteins into a shared latent space where interacting partners lie close together, so candidate interactions can be found by nearest-neighbor search rather than exhaustive pairwise scoring.
The model was developed by Tatta Bio (Andre Cornman, Matt Tranzillo, Nicolo G. Zulaybar, Imane Bouzit, and Yunha Hwang), with a preprint posted to bioRxiv in March 2026. It builds on the team's genomic language modeling work, initializing from a genomic language model (gLM2) that captures cross-protein co-evolutionary signals learned from metagenomic sequences and genomic context.
By grounding its contrastive training in residue-level interaction signals and leveraging metagenomic co-evolution, FlashPPI reports an approximately four-fold improvement over existing sequence-based PPI methods while reducing proteome-wide screening time from days to minutes, at performance comparable to structure-folding models at a fraction of the cost.
FlashPPI is a contrastive learning framework that maps proteins into a shared latent space in which interacting partners are aligned, so that physical interfaces across a microbial proteome can be identified by retrieval. It initializes from the gLM2 genomic language model, which is trained on metagenomic sequences and learns cross-protein co-evolutionary signals from genomic context; the contrastive objective is grounded in residue-level interaction information. Reported evaluations show an approximately four-fold gain over existing sequence-based PPI prediction methods, with screening performance comparable to state-of-the-art structure-folding approaches at a fraction of the computational cost. Code is released on GitHub under CC BY-NC 4.0, weights are available on Hugging Face, and the model is deployed in an interactive web platform.
FlashPPI targets microbial discovery and functional genomics, enabling researchers to build proteome-wide interaction networks rapidly and to prioritize candidate physical interactions for follow-up. It is deployed at seqhub.org, an interactive platform that combines predicted networks with functional annotations and genomic context, making proteome-scale network analysis accessible to microbiologists and metagenomics researchers without large compute budgets.
FlashPPI shows that genomic language models trained on metagenomic co-evolution can power scalable, structure-free interaction prediction, bringing proteome-wide PPI screening within reach of routine analysis. By pairing competitive accuracy with linear-time inference and an interactive deployment, it expands the practical toolkit for microbial interactome mapping. As a recent preprint released under a non-commercial license with open code and weights, its broader influence will depend on independent benchmarking, but it offers a compelling efficiency-versus-accuracy tradeoff relative to folding-based screens.