bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein

Hybrid Gated Fusion

University College London

A multimodal deep-learning framework that fuses sequence, structure, text, and interaction embeddings to predict Gene Ontology function annotations, reaching state of the art on CAFA3.

Released: April 2026

Predicting the molecular function of a protein from its sequence is one of the oldest open problems in computational biology, formalized through the community-wide Critical Assessment of Functional Annotation (CAFA) challenges. The task is framed as assigning Gene Ontology (GO) terms across three aspects — Biological Process (BPO), Cellular Component (CCO), and Molecular Function (MFO) — to proteins that have never been experimentally characterized. While single-source predictors built on protein language models have steadily improved, individual modalities capture only part of the functional picture: sequence and structure describe what a protein is, whereas text descriptions and interaction networks describe the functional context in which it operates.

Hybrid Gated Fusion, developed by Zijian Zhou and Daniel W. A. Buchan at University College London and released as a bioRxiv preprint in April 2026, addresses this by combining four complementary embedding sources within a single multimodal model. Rather than concatenating or equally weighting modalities, the framework uses a learned gating mechanism to assess both how informative each modality is for a given protein and how much the modalities agree with one another, dynamically down-weighting noisy or missing inputs.

The result is a compact predictor (a single model checkpoint of roughly 93.5 MB) that reports state-of-the-art Fmax on two of the three CAFA3 aspects while remaining competitive on the third, demonstrating that thoughtful fusion can outperform larger single-modality systems.

#Key Features

  • Bilinear gated fusion: A learned gate evaluates each modality's informativeness and its cross-modal agreement, weighting inputs adaptively instead of treating all sources equally — the core mechanism behind the model's gains over single-source baselines.
  • Four complementary modalities: Integrates ProtT5-XL sequence embeddings (1024-D), ESM-IF1 inverse-folding structure embeddings (512-D), PubMedBERT text embeddings (768-D), and STRING protein-protein interaction network embeddings (512-D).
  • Robust to missing inputs: Learned masking lets the model handle proteins for which structure, text, or interaction data are unavailable, a common situation in real annotation pipelines.
  • Auxiliary supervision: Per-modality auxiliary heads reduce dominance by any single modality and preserve useful signal in weaker inputs.
  • Temporal decontamination: Test sets use historical UniProt records with a pre-2016 cutoff so that text descriptions cannot leak future annotations, guarding against an optimistic evaluation common to text-based predictors.

#Technical Details

The framework, implemented in PyTorch and distributed as the psipred/PFP repository, takes precomputed embeddings from four frozen upstream encoders and fuses them through a hybrid gated bilinear module with modality-specific auxiliary heads. Training and evaluation follow the CAFA3 benchmark splits, with text features drawn from temporally filtered UniProt descriptions (pre-2016-02-17 cutoff) and structural coverage provided by AlphaFold models. Performance is reported with CAFA-compliant metrics — Fmax, weighted Fmax, and Smin. A single Hybrid Gated Fusion model achieves Fmax of 0.601 (BPO), 0.706 (CCO), and 0.702 (MFO), setting a new state of the art on Biological Process and Cellular Component while staying competitive on Molecular Function. Pretrained checkpoints and the full set of precomputed CAFA3 embeddings are deposited on Zenodo; the code and deposited artifacts are released under the MIT License.

#Applications

The model is aimed at researchers and annotation pipelines that need GO term predictions for uncharacterized proteins — for example, prioritizing candidate genes from a newly sequenced genome, transferring functional hypotheses to proteins lacking experimental evidence, or augmenting reference databases. Because it gracefully degrades when modalities are missing, it can be applied to proteins with no resolved structure or no interaction-network entry, falling back on the available sequence and text signal. The released checkpoints and embeddings let users reproduce the CAFA3 results or fine-tune on their own annotation tasks without recomputing upstream representations.

#Impact

Hybrid Gated Fusion contributes to a broader shift in protein function prediction away from single-modality language models toward systems that explicitly reconcile intrinsic features (sequence, structure) with extrinsic functional context (literature, interaction networks). By showing that a learned gating strategy — paired with careful temporal decontamination to prevent text leakage — can deliver state-of-the-art CAFA3 results from a single compact model, it offers a practical recipe for multimodal integration that downstream annotation tools can adopt. As a recent preprint its long-term adoption is not yet established, and the reported gains are specific to the CAFA3 benchmark rather than independently validated across diverse proteomes.

Tags

protein_function_predictiongo_term_predictiongated_fusiontransformermultimodalsupervisedproteomicsprotein_protein_interaction