Overview

Evolla is a multimodal protein-language generative model developed at Westlake University that bridges the gap between structural biology and natural language understanding. Rather than returning discrete class labels or numeric scores, Evolla responds to open-ended questions about a protein's function in natural language, enabling researchers to formulate hypotheses, explore mechanistic hypotheses, and interrogate proteins interactively. The model was introduced in a January 2025 bioRxiv preprint by Zhou et al., with Fajie Yuan as the corresponding author.

The central challenge Evolla addresses is functional annotation: given a protein sequence and structure, what does it do? Traditional approaches rely on homology transfer or supervised classifiers trained on curated ontology terms. These methods struggle when a query protein lacks close characterized relatives or when the biological question does not fit a predefined classification scheme. Evolla reframes annotation as a generative dialogue task, allowing users to pose arbitrary natural-language questions and receive contextually grounded, paragraph-length answers about enzyme activity, subcellular localization, evolutionary relationships, and disease involvement.

Evolla was trained on 546 million protein-question-answer triples spanning 150 billion word tokens, making it the largest protein-text training corpus reported to date. The preprint validates the model through two experimental demonstrations: identifying candidate eukaryotic signature proteins in Asgard archaea (with functional Vps4 homologs confirmed by yeast complementation assays), and discovering a novel deep-sea PET hydrolase, PsPETase, experimentally verified to degrade plastic films. These validations distinguish Evolla from systems whose claims rest solely on held-out benchmark performance.

Key Features

Natural language protein dialogue: Users submit protein sequences alongside free-text questions; Evolla generates descriptive, paragraph-length answers rather than fixed categorical outputs, enabling flexible hypothesis-driven exploration.
Multimodal protein encoding: Each protein is represented by its amino acid sequence, FoldSeek-encoded structural tokens, and optionally a multiple sequence alignment, giving the model simultaneous access to sequence, structure, and evolutionary context.
Retrieval-Augmented Generation (RAG): Two retrieval strategies — Direct Query Search (DQS) and Query-Guided Search (QGS) — allow Evolla to augment its responses with information retrieved from annotated protein databases, reducing hallucination on rare functional classes.
Direct Preference Optimization (DPO): Post-training alignment via DPO refines the model's responses against human preference signals, improving factual accuracy and relevance compared to supervised fine-tuning alone.
Zero-shot functional inference: On standard benchmarks including EC number classification, Gene Ontology annotation, and subcellular localization, Evolla achieves zero-shot performance competitive with state-of-the-art supervised classifiers that are trained on labeled examples.
Scalable architecture: Available in a 10B-parameter version (Evolla-10B, publicly released on HuggingFace) and an 80B-parameter version with substantially higher benchmark scores, enabling deployment across a range of compute budgets.

Technical Details

Evolla is built from three pre-trained modules assembled into a unified encoder-decoder pipeline. The protein encoder is SaProt (650M parameters in the 10B variant; 1.3B parameters in the 80B variant), a structure-aware protein language model that ingests interleaved amino acid and FoldSeek structural tokens. The encoder output is passed through a trainable cross-attention compressor and aligner module (1.7B parameters in the 10B variant; 8B in the 80B variant) that maps high-dimensional protein representations into the token embedding space expected by the language model decoder. The decoder is Meta-Llama-3-8B-Instruct in the 10B variant and a 70B Llama-based model in the 80B variant; both decoder weights are frozen during training, preserving general linguistic capabilities while the compressor/aligner learns to translate molecular representations into language.

Training used protein-text pairs derived from Swiss-Prot and TrEMBL annotations, augmented with ProTrek retrieval to generate diverse question-answer triples at scale. On the authors' Instructional Response Space (IRS) evaluation framework, Evolla-80B achieved a GPT-scored quality of 74.10 ± 0.81, substantially exceeding GPT-4o (37.07 ± 0.54) and DeepSeek-v3 (40.49 ± 0.56) on protein-specific functional queries. The 10B model weights are available on HuggingFace under an MIT license; the 80B model is accessible via the online chat interface.

Applications

Evolla is well suited to protein function annotation tasks where homology-based methods fail due to low sequence identity to characterized proteins. Functional genomicists can query novel open reading frames from metagenomic datasets to generate functional hypotheses before committing to experimental validation. Enzyme engineers can ask targeted questions about catalytic mechanisms, cofactor requirements, or substrate scope to prioritize variants for directed evolution campaigns. Evolutionary biologists can probe remote functional relationships — such as the eukaryotic-like functions in archaeal proteins — that conventional BLAST-based analysis would miss. The chat interface at chat-protein.com provides immediate access without local deployment, lowering the barrier for wet-lab scientists unfamiliar with command-line tools.

Impact

Evolla represents a meaningful shift in how protein function annotation can be approached, moving from static lookup tables and classification pipelines toward interactive, generative reasoning. The experimental validations — particularly the PsPETase discovery and the Asgard archaea analysis — demonstrate that the model's outputs are actionable rather than merely predictive. The open release of Evolla-10B on HuggingFace and the publicly accessible chat interface position it as a broadly usable community resource. As a preprint, the work has not yet undergone formal peer review, and independent benchmarking of the RAG components and DPO alignment on standardized community tasks would strengthen confidence in the reported performance gains. The 80B variant's training status and full release timeline had not been finalized at the time of the preprint.

Overview

Key Features

Natural language protein dialogue: Users submit protein sequences alongside free-text questions; Evolla generates descriptive, paragraph-length answers rather than fixed categorical outputs, enabling flexible hypothesis-driven exploration.

Multimodal protein encoding: Each protein is represented by its amino acid sequence, FoldSeek-encoded structural tokens, and optionally a multiple sequence alignment, giving the model simultaneous access to sequence, structure, and evolutionary context.

Retrieval-Augmented Generation (RAG): Two retrieval strategies — Direct Query Search (DQS) and Query-Guided Search (QGS) — allow Evolla to augment its responses with information retrieved from annotated protein databases, reducing hallucination on rare functional classes.

Direct Preference Optimization (DPO): Post-training alignment via DPO refines the model's responses against human preference signals, improving factual accuracy and relevance compared to supervised fine-tuning alone.

Zero-shot functional inference: On standard benchmarks including EC number classification, Gene Ontology annotation, and subcellular localization, Evolla achieves zero-shot performance competitive with state-of-the-art supervised classifiers that are trained on labeled examples.

Scalable architecture: Available in a 10B-parameter version (Evolla-10B, publicly released on HuggingFace) and an 80B-parameter version with substantially higher benchmark scores, enabling deployment across a range of compute budgets.

Technical Details

Applications

Impact

Evolla

Overview

Key Features

Technical Details

Applications

Impact

Citation

Evolla Paper

Metrics

GitHub

Citations

HuggingFace

Tags

Resources

Evolla

Overview

Key Features

Technical Details

Applications

Impact

Citation

Evolla Paper

Metrics

GitHub

Citations

HuggingFace

Tags

Resources