Technion – Israel Institute of Technology / Meta AI
A pretrained protein representation model that iteratively fuses a sequence language model and a 3D structure encoder through a shared learnable token.
FusionProt is a pretrained protein representation model that unifies two complementary views of a protein — its amino acid sequence and its three-dimensional structure — into a single learned representation. Developed by Dan Kalifa and Kira Radinsky at the Technion – Israel Institute of Technology together with Uriel Singer at Meta AI, and released as a bioRxiv preprint in 2025, it addresses a long-standing limitation in protein machine learning: sequence-based protein language models (such as the ESM family) and structure-based encoders (such as GearNet) each capture only part of the signal that governs protein function, and most attempts to combine them simply concatenate their outputs after the fact.
Rather than late fusion, FusionProt introduces an iterative, bidirectional exchange of information between the two modalities throughout the network. The key idea is a single learnable "fusion token" that acts as an adaptive bridge. The token is appended to the sequence so that the language model's attention can read from and write to it, and it is simultaneously inserted as an extra node in the protein's 3D structure graph, connected to every residue, so that the graph encoder updates it during spatial message passing. By alternating between sequence attention and structural message passing across layers, the fusion token carries information back and forth, letting each modality condition on the other as representations are built up.
This design keeps the two backbones largely intact while threading a learnable channel between them, yielding a unified embedding that reflects both sequence and structure with only modest additional cost over running the encoders independently.
FusionProt couples a transformer-based protein language model (from the ESM family) with a geometric graph neural network structure encoder (GearNet), building on the ESM-GearNet line of work. The learnable fusion token is concatenated to the input sequence for the language model and added as an additional node — connected to all residue nodes — in the 3D structure graph processed by the graph encoder, with the two stages alternating so the token shuttles information across modalities. The model is pretrained self-supervised on structures from the AlphaFold Protein Structure Database using a multiview contrastive objective. On downstream evaluations, the authors report median gains of roughly +1.3 Fmax points across EC and GO function-prediction benchmarks and about +3.6 AUROC points on mutation stability prediction relative to the strongest baseline, while keeping the added inference overhead in the ~2–5% range.
FusionProt produces general-purpose protein embeddings that can be fine-tuned or used as features for a range of downstream tasks, including enzyme function classification (EC numbers), Gene Ontology annotation, and prediction of how point mutations affect protein stability. Researchers studying protein function, computational biologists annotating newly sequenced proteins, and protein engineers reasoning about stabilizing or destabilizing mutations can use it wherever both a sequence and a (predicted or experimental) structure are available, benefiting from a representation that integrates the two without the cost of training a fully joint model from scratch.
FusionProt contributes to an active line of research on combining sequence and structure signals in protein foundation models, offering a lightweight alternative to heavier joint architectures by routing cross-modal information through a single learnable token. Its reported gains on standard EC, GO, and stability benchmarks, achieved with only a few percent of additional inference cost, suggest that deep iterative fusion can outperform late concatenation of independently trained encoders. As a recent preprint its long-term adoption remains to be seen, and results await peer review; a reference implementation and pretrained weights are released on GitHub (with checkpoints distributed via Google Drive), though no dedicated model card, data card, or HuggingFace deployment was available at the time of writing.
Kalifa, D., et al. (2025) FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning. bioRxiv.
DOI: 10.1101/2025.08.06.668973Papers that recently cited this model.
Ruyang Cheng, Tianyu Liu, Chentao Liao, et al.
International Journal of Molecular Sciences · Apr 2026
Cong Qi, Wenbo Wang, Hanzhang Fang, et al.
bioRxiv · Apr 2026
The most-cited papers that cite this model.
Ruyang Cheng, Tianyu Liu, Chentao Liao, et al.
International Journal of Molecular Sciences · Apr 2026
Cong Qi, Wenbo Wang, Hanzhang Fang, et al.
bioRxiv · Apr 2026
Share of papers citing this model.