Hybrid Gated Fusion

Protein function prediction model that fuses sequence, structure, text, and interaction embeddings with learned gating to assign Gene Ontology terms.

Released: April 2026

Predicting the molecular function of a protein from its sequence is one of the oldest open problems in computational biology, formalized through the community-wide Critical Assessment of Functional Annotation (CAFA) challenges. The task is framed as assigning Gene Ontology (GO) terms across three aspects — Biological Process (BPO), Cellular Component (CCO), and Molecular Function (MFO) — to proteins that have never been experimentally characterized. While single-source predictors built on protein language models have steadily improved, individual modalities capture only part of the functional picture: sequence and structure describe what a protein is, whereas text descriptions and interaction networks describe the functional context in which it operates.

Hybrid Gated Fusion, developed by Zijian Zhou and Daniel W. A. Buchan at University College London and released as a bioRxiv preprint in April 2026, addresses this by combining four complementary embedding sources within a single multimodal model. Rather than concatenating or equally weighting modalities, the framework uses a learned gating mechanism to assess both how informative each modality is for a given protein and how much the modalities agree with one another, dynamically down-weighting noisy or missing inputs.

The result is a compact predictor (a single model checkpoint of roughly 93.5 MB) that reports state-of-the-art Fmax on two of the three CAFA3 aspects while remaining competitive on the third, demonstrating that thoughtful fusion can outperform larger single-modality systems.

Key Features

Bilinear gated fusion: A learned gate evaluates each modality's informativeness and its cross-modal agreement, weighting inputs adaptively instead of treating all sources equally — the core mechanism behind the model's gains over single-source baselines.
Four complementary modalities: Integrates ProtT5-XL sequence embeddings (1024-D), ESM-IF1 inverse-folding structure embeddings (512-D), PubMedBERT text embeddings (768-D), and STRING protein-protein interaction network embeddings (512-D).
Robust to missing inputs: Learned masking lets the model handle proteins for which structure, text, or interaction data are unavailable, a common situation in real annotation pipelines.
Auxiliary supervision: Per-modality auxiliary heads reduce dominance by any single modality and preserve useful signal in weaker inputs.
Temporal decontamination: Test sets use historical UniProt records with a pre-2016 cutoff so that text descriptions cannot leak future annotations, guarding against an optimistic evaluation common to text-based predictors.

Technical Details

The framework, implemented in PyTorch and distributed as the psipred/PFP repository, takes precomputed embeddings from four frozen upstream encoders and fuses them through a hybrid gated bilinear module with modality-specific auxiliary heads. Training and evaluation follow the CAFA3 benchmark splits, with text features drawn from temporally filtered UniProt descriptions (pre-2016-02-17 cutoff) and structural coverage provided by AlphaFold models. Performance is reported with CAFA-compliant metrics — Fmax, weighted Fmax, and Smin. A single Hybrid Gated Fusion model achieves Fmax of 0.601 (BPO), 0.706 (CCO), and 0.702 (MFO), setting a new state of the art on Biological Process and Cellular Component while staying competitive on Molecular Function. Pretrained checkpoints and the full set of precomputed CAFA3 embeddings are deposited on Zenodo; the code and deposited artifacts are released under the MIT License.

Applications

The model is aimed at researchers and annotation pipelines that need GO term predictions for uncharacterized proteins — for example, prioritizing candidate genes from a newly sequenced genome, transferring functional hypotheses to proteins lacking experimental evidence, or augmenting reference databases. Because it gracefully degrades when modalities are missing, it can be applied to proteins with no resolved structure or no interaction-network entry, falling back on the available sequence and text signal. The released checkpoints and embeddings let users reproduce the CAFA3 results or fine-tune on their own annotation tasks without recomputing upstream representations.

Impact

Hybrid Gated Fusion contributes to a broader shift in protein function prediction away from single-modality language models toward systems that explicitly reconcile intrinsic features (sequence, structure) with extrinsic functional context (literature, interaction networks). By showing that a learned gating strategy — paired with careful temporal decontamination to prevent text leakage — can deliver state-of-the-art CAFA3 results from a single compact model, it offers a practical recipe for multimodal integration that downstream annotation tools can adopt. As a recent preprint its long-term adoption is not yet established, and the reported gains are specific to the CAFA3 benchmark rather than independently validated across diverse proteomes.

Citation

Hybrid Gated Fusion: A Multimodal Deep Learning Framework for Protein Function Annotation

Zhou, Z. & Buchan, D. (2026) Hybrid Gated Fusion: A Multimodal Deep Learning Framework for Protein Function Annotation. bioRxiv.

DOI: 10.64898/2026.04.14.718564

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References93

GitHub

Stars2

Forks0

Open Issues0

Contributors1

Last Push3mo ago

LanguagePython

LicenseMIT

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible

84Open

Usability — can I run it?95

Reproducibility — can I retrain it?87

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper Dataset

Key Features

Bilinear gated fusion: A learned gate evaluates each modality's informativeness and its cross-modal agreement, weighting inputs adaptively instead of treating all sources equally — the core mechanism behind the model's gains over single-source baselines.

Four complementary modalities: Integrates ProtT5-XL sequence embeddings (1024-D), ESM-IF1 inverse-folding structure embeddings (512-D), PubMedBERT text embeddings (768-D), and STRING protein-protein interaction network embeddings (512-D).

Robust to missing inputs: Learned masking lets the model handle proteins for which structure, text, or interaction data are unavailable, a common situation in real annotation pipelines.

Auxiliary supervision: Per-modality auxiliary heads reduce dominance by any single modality and preserve useful signal in weaker inputs.

Temporal decontamination: Test sets use historical UniProt records with a pre-2016 cutoff so that text descriptions cannot leak future annotations, guarding against an optimistic evaluation common to text-based predictors.

Technical Details

Applications

Impact

Hybrid Gated Fusion

Key Features

Technical Details

Applications

Impact

Citation

Hybrid Gated Fusion: A Multimodal Deep Learning Framework for Protein Function Annotation

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Hybrid Gated Fusion

Key Features

Technical Details

Applications

Impact

Citation

Hybrid Gated Fusion: A Multimodal Deep Learning Framework for Protein Function Annotation

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Hybrid Gated Fusion

#Key Features

#Technical Details

#Applications

#Impact

Citation

Hybrid Gated Fusion: A Multimodal Deep Learning Framework for Protein Function Annotation

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Hybrid Gated Fusion

#Key Features

#Technical Details

#Applications

#Impact

Citation

Hybrid Gated Fusion: A Multimodal Deep Learning Framework for Protein Function Annotation

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact