bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

FusionProt

Technion – Israel Institute of Technology / Meta AI

A pretrained protein representation model that iteratively fuses a sequence language model and a 3D structure encoder through a shared learnable token.

Released: November 2025

FusionProt is a pretrained protein representation model that unifies two complementary views of a protein — its amino acid sequence and its three-dimensional structure — into a single learned representation. Developed by Dan Kalifa and Kira Radinsky at the Technion – Israel Institute of Technology together with Uriel Singer at Meta AI, and released as a bioRxiv preprint in 2025, it addresses a long-standing limitation in protein machine learning: sequence-based protein language models (such as the ESM family) and structure-based encoders (such as GearNet) each capture only part of the signal that governs protein function, and most attempts to combine them simply concatenate their outputs after the fact.

Rather than late fusion, FusionProt introduces an iterative, bidirectional exchange of information between the two modalities throughout the network. The key idea is a single learnable "fusion token" that acts as an adaptive bridge. The token is appended to the sequence so that the language model's attention can read from and write to it, and it is simultaneously inserted as an extra node in the protein's 3D structure graph, connected to every residue, so that the graph encoder updates it during spatial message passing. By alternating between sequence attention and structural message passing across layers, the fusion token carries information back and forth, letting each modality condition on the other as representations are built up.

This design keeps the two backbones largely intact while threading a learnable channel between them, yielding a unified embedding that reflects both sequence and structure with only modest additional cost over running the encoders independently.

#Key Features

  • Shared learnable fusion token: A single token serves as a carrier that is both attended to by the sequence language model and treated as a fully connected node in the structure graph, mediating information transfer between the two modalities.
  • Iterative bidirectional fusion: Information is exchanged repeatedly across layers rather than once at the end, so sequence and structure representations are refined jointly rather than merged after independent encoding.
  • Low inference overhead: The fusion mechanism adds only roughly 2–5% to inference cost relative to the underlying sequence and structure encoders, making it practical to adopt in place of single-modality models.
  • State-of-the-art benchmark results: Reported improvements on Enzyme Commission (EC) and Gene Ontology (GO) function prediction and on mutation stability prediction over strong sequence-only and structure-only baselines.

#Technical Details

FusionProt couples a transformer-based protein language model (from the ESM family) with a geometric graph neural network structure encoder (GearNet), building on the ESM-GearNet line of work. The learnable fusion token is concatenated to the input sequence for the language model and added as an additional node — connected to all residue nodes — in the 3D structure graph processed by the graph encoder, with the two stages alternating so the token shuttles information across modalities. The model is pretrained self-supervised on structures from the AlphaFold Protein Structure Database using a multiview contrastive objective. On downstream evaluations, the authors report median gains of roughly +1.3 Fmax points across EC and GO function-prediction benchmarks and about +3.6 AUROC points on mutation stability prediction relative to the strongest baseline, while keeping the added inference overhead in the ~2–5% range.

#Applications

FusionProt produces general-purpose protein embeddings that can be fine-tuned or used as features for a range of downstream tasks, including enzyme function classification (EC numbers), Gene Ontology annotation, and prediction of how point mutations affect protein stability. Researchers studying protein function, computational biologists annotating newly sequenced proteins, and protein engineers reasoning about stabilizing or destabilizing mutations can use it wherever both a sequence and a (predicted or experimental) structure are available, benefiting from a representation that integrates the two without the cost of training a fully joint model from scratch.

#Impact

FusionProt contributes to an active line of research on combining sequence and structure signals in protein foundation models, offering a lightweight alternative to heavier joint architectures by routing cross-modal information through a single learnable token. Its reported gains on standard EC, GO, and stability benchmarks, achieved with only a few percent of additional inference cost, suggest that deep iterative fusion can outperform late concatenation of independently trained encoders. As a recent preprint its long-term adoption remains to be seen, and results await peer review; a reference implementation and pretrained weights are released on GitHub (with checkpoints distributed via Google Drive), though no dedicated model card, data card, or HuggingFace deployment was available at the time of writing.

Citation

FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning

Preprint

Kalifa, D., et al. (2025) FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning. bioRxiv.

DOI: 10.1101/2025.08.06.668973

Recent citations

Papers that recently cited this model.

  • Integrating Protein Language Models with Multimodal Embeddings to Accelerate Function Prediction of Uncharacterized Proteins

    Ruyang Cheng, Tianyu Liu, Chentao Liao, et al.

    International Journal of Molecular Sciences · Apr 2026

    0
  • When Multimodal Fusion Fails: Contrastive Alignment as a Necessary Stabilizer for TCR–Peptide Binding Prediction

    Cong Qi, Wenbo Wang, Hanzhang Fang, et al.

    bioRxiv · Apr 2026

    0

Top citations

The most-cited papers that cite this model.

  • Integrating Protein Language Models with Multimodal Embeddings to Accelerate Function Prediction of Uncharacterized Proteins

    Ruyang Cheng, Tianyu Liu, Chentao Liao, et al.

    International Journal of Molecular Sciences · Apr 2026

    0
  • When Multimodal Fusion Fails: Contrastive Alignment as a Necessary Stabilizer for TCR–Peptide Binding Prediction

    Cong Qi, Wenbo Wang, Hanzhang Fang, et al.

    bioRxiv · Apr 2026

    0

Citations

Total Citations2
Influential0
References69

GitHub

Stars18
Forks1
Open Issues0
Contributors1
Last Push6mo ago
LanguagePython
LicenseMIT

Fields of citing research

  • Biology100%
  • Computer Science100%
  • Medicine50%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible
68Partial
Usability — can I run it?64
Reproducibility — can I retrain it?77
Model Openness Framework
Unclassified
Missing required components

Resources

GitHub RepositoryResearch Paper