An 80B-parameter multimodal protein-language model that decodes protein function through natural language dialogue, integrating sequence, structure, and evolutionary context.
Evolla is a multimodal protein-language generative model developed at Westlake University that bridges the gap between structural biology and natural language understanding. Rather than returning discrete class labels or numeric scores, Evolla responds to open-ended questions about a protein's function in natural language, enabling researchers to formulate hypotheses, explore mechanistic hypotheses, and interrogate proteins interactively. The model was introduced in a January 2025 bioRxiv preprint by Zhou et al., with Fajie Yuan as the corresponding author.
The central challenge Evolla addresses is functional annotation: given a protein sequence and structure, what does it do? Traditional approaches rely on homology transfer or supervised classifiers trained on curated ontology terms. These methods struggle when a query protein lacks close characterized relatives or when the biological question does not fit a predefined classification scheme. Evolla reframes annotation as a generative dialogue task, allowing users to pose arbitrary natural-language questions and receive contextually grounded, paragraph-length answers about enzyme activity, subcellular localization, evolutionary relationships, and disease involvement.
Evolla was trained on 546 million protein-question-answer triples spanning 150 billion word tokens, making it the largest protein-text training corpus reported to date. The preprint validates the model through two experimental demonstrations: identifying candidate eukaryotic signature proteins in Asgard archaea (with functional Vps4 homologs confirmed by yeast complementation assays), and discovering a novel deep-sea PET hydrolase, PsPETase, experimentally verified to degrade plastic films. These validations distinguish Evolla from systems whose claims rest solely on held-out benchmark performance.
Evolla is built from three pre-trained modules assembled into a unified encoder-decoder pipeline. The protein encoder is SaProt (650M parameters in the 10B variant; 1.3B parameters in the 80B variant), a structure-aware protein language model that ingests interleaved amino acid and FoldSeek structural tokens. The encoder output is passed through a trainable cross-attention compressor and aligner module (1.7B parameters in the 10B variant; 8B in the 80B variant) that maps high-dimensional protein representations into the token embedding space expected by the language model decoder. The decoder is Meta-Llama-3-8B-Instruct in the 10B variant and a 70B Llama-based model in the 80B variant; both decoder weights are frozen during training, preserving general linguistic capabilities while the compressor/aligner learns to translate molecular representations into language.
Training used protein-text pairs derived from Swiss-Prot and TrEMBL annotations, augmented with ProTrek retrieval to generate diverse question-answer triples at scale. On the authors' Instructional Response Space (IRS) evaluation framework, Evolla-80B achieved a GPT-scored quality of 74.10 ± 0.81, substantially exceeding GPT-4o (37.07 ± 0.54) and DeepSeek-v3 (40.49 ± 0.56) on protein-specific functional queries. The 10B model weights are available on HuggingFace under an MIT license; the 80B model is accessible via the online chat interface.
Evolla is well suited to protein function annotation tasks where homology-based methods fail due to low sequence identity to characterized proteins. Functional genomicists can query novel open reading frames from metagenomic datasets to generate functional hypotheses before committing to experimental validation. Enzyme engineers can ask targeted questions about catalytic mechanisms, cofactor requirements, or substrate scope to prioritize variants for directed evolution campaigns. Evolutionary biologists can probe remote functional relationships — such as the eukaryotic-like functions in archaeal proteins — that conventional BLAST-based analysis would miss. The chat interface at chat-protein.com provides immediate access without local deployment, lowering the barrier for wet-lab scientists unfamiliar with command-line tools.
Evolla represents a meaningful shift in how protein function annotation can be approached, moving from static lookup tables and classification pipelines toward interactive, generative reasoning. The experimental validations — particularly the PsPETase discovery and the Asgard archaea analysis — demonstrate that the model's outputs are actionable rather than merely predictive. The open release of Evolla-10B on HuggingFace and the publicly accessible chat interface position it as a broadly usable community resource. As a preprint, the work has not yet undergone formal peer review, and independent benchmarking of the RAG components and DPO alignment on standardized community tasks would strengthen confidence in the reported performance gains. The 80B variant's training status and full release timeline had not been finalized at the time of the preprint.