BioBridge

Connects a frozen protein language model to a general LLM via a cross-modal projector, adding protein reasoning without catastrophic forgetting.

Released: February 2026

Protein language models (PLMs) such as ESM2 capture rich sequence-level biology but cannot reason in natural language, while general-purpose large language models (LLMs) reason fluently but lack grounded protein knowledge. BioBridge, described in a February 2026 arXiv preprint from researchers at Chinese institutions (including Tongji and Fudan University), aims to combine these strengths: to give an LLM genuine protein understanding while preserving the broad reasoning and knowledge it already has.

The central challenge is catastrophic forgetting—naively fine-tuning an LLM on protein data degrades its general abilities. BioBridge addresses this with Domain-Incremental Continual Pre-training (DICP), which infuses protein-domain knowledge alongside a general reasoning corpus so the model gains specialized competence without sacrificing its original skills. A cross-modal projector connects a frozen PLM's protein embeddings into the LLM's semantic space, letting the language model attend to protein representations as if they were another modality.

Key Features

PLM-projector-LLM architecture: A frozen protein language model is connected to an LLM through a learned projector, aligning protein embeddings with the language model's semantic space.
Domain-Incremental Continual Pre-training (DICP): Trains on protein knowledge and a general reasoning corpus together to inject domain expertise while mitigating catastrophic forgetting.
Dual competence: Targets competitive results on protein benchmarks (EC, BindingDB) while remaining on par with the base LLM on general tasks (MMLU, RACE).
Multi-task and conversational: Supports protein property prediction and knowledge-based question answering within a single language-model interface.

Technical Details

BioBridge uses ESM2 as a frozen protein encoder and Qwen2.5-7B-Instruct as the language backbone. A Q-Former-style projector extracts a fixed number of query tokens from protein embeddings via cross-attention, producing protein representations that the LLM consumes alongside text. Training follows the DICP recipe, interleaving protein-domain data with a general corpus to limit forgetting. Reported results include localization (DeepLoc multi) at 0.815 versus ESM2's 0.759, metal-ion binding at 0.761 versus 0.712, and EC annotation at 0.743, while general performance (e.g., MMLU 63.30 versus the base model's 70.41) is largely retained. As of this preprint, the authors note no released weights or code; architecture and benchmark figures should be confirmed against the paper.

Applications

BioBridge is aimed at protein scientists who want to query and reason about proteins through a conversational language interface—asking about properties, function, or binding while receiving answers grounded in a protein encoder. By unifying protein property prediction and free-form question answering in one model, it points toward assistant-style tools that combine PLM accuracy with LLM usability for tasks such as annotation triage and hypothesis generation.

Impact

BioBridge contributes to a growing line of work that fuses biomolecular encoders with general LLMs, and its focus on continual pretraining to avoid catastrophic forgetting is a notable design choice in that space. Its practical reach is currently limited by the absence of released weights or code, and as a February 2026 preprint its results await peer review and independent reproduction.

Citations

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

Preprint

Wang, Y., et al. (2025) BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs. IEEE International Conference on Bioinformatics and Biomedicine.

DOI: 10.48550/arXiv.2602.17680

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

Wang, Y., et al. (2025) BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs. IEEE International Conference on Bioinformatics and Biomedicine.

DOI: 10.1109/BIBM66473.2025.11356714

Recent citations

Papers that recently cited this model.

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation
Anvita Gupta, Anshul B Kundaje, Alejandro Buendia, et al.
bioRxiv · May 2026
0
GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction
Hanbo Huang, Xuan Gong, Jing Wang, et al.
May 2026
0

Top citations

The most-cited papers that cite this model.

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation
Anvita Gupta, Anshul B Kundaje, Alejandro Buendia, et al.
bioRxiv · May 2026
0
GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction
Hanbo Huang, Xuan Gong, Jing Wang, et al.
May 2026
0

Citations

Total Citations2

Influential0

References32

Fields of citing research

Biology100%
Computer Science100%
Medicine50%
Environmental Science50%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

13Closed

Usability — can I run it?9

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

PLM-projector-LLM architecture: A frozen protein language model is connected to an LLM through a learned projector, aligning protein embeddings with the language model's semantic space.

Domain-Incremental Continual Pre-training (DICP): Trains on protein knowledge and a general reasoning corpus together to inject domain expertise while mitigating catastrophic forgetting.

Dual competence: Targets competitive results on protein benchmarks (EC, BindingDB) while remaining on par with the base LLM on general tasks (MMLU, RACE).

Multi-task and conversational: Supports protein property prediction and knowledge-based question answering within a single language-model interface.

Technical Details

Applications

Impact

Citations

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

Preprint

Wang, Y., et al. (2025) BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs. IEEE International Conference on Bioinformatics and Biomedicine.

DOI: 10.48550/arXiv.2602.17680

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

Wang, Y., et al. (2025) BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs. IEEE International Conference on Bioinformatics and Biomedicine.

DOI: 10.1109/BIBM66473.2025.11356714

Recent citations

Papers that recently cited this model.

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

Anvita Gupta, Anshul B Kundaje, Alejandro Buendia, et al.

bioRxiv · May 2026

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Hanbo Huang, Xuan Gong, Jing Wang, et al.

May 2026

Top citations

The most-cited papers that cite this model.

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

Anvita Gupta, Anshul B Kundaje, Alejandro Buendia, et al.

bioRxiv · May 2026

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Hanbo Huang, Xuan Gong, Jing Wang, et al.

May 2026

BioBridge

Key Features

Technical Details

Applications

Impact

Citations

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

Recent citations

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Top citations

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Citations

Fields of citing research

Openness

Tags

Resources

BioBridge

Key Features

Technical Details

Applications

Impact

Citations

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

Recent citations

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Top citations

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Citations

Fields of citing research

Openness

Tags

Resources

BioBridge

#Key Features

#Technical Details

#Applications

#Impact

Citations

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

Recent citations

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Top citations

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Related models

Citations

Fields of citing research

Openness

Tags

Resources

BioBridge

#Key Features

#Technical Details

#Applications

#Impact

Citations

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

Recent citations

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Top citations

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact