A multimodal generative model that learns disentangled evolutionary representations to predict viral antigenic change, generalizing zero-shot across viral families.
DERIVE is a generative foundation model for predicting viral antigenic evolution — the process by which viruses change their surface proteins to escape host immunity. Anticipating antigenic change is central to vaccine design and surveillance, but it is hard to model because it depends on the interplay of evolutionary history, physicochemical properties, and protein structure. DERIVE addresses this by learning a disentangled latent representation that integrates these complementary signals into a single multimodal model.
The model's core idea is disentanglement: it separates the factors that drive antigenic change so that sequence-homology information, physicochemical features, and structural features each occupy interpretable parts of the latent space. This design enables cross-virus generalization — DERIVE is reported to transfer zero-shot to four viral families, predicting antigenic change for viruses beyond those seen during training. It was developed by Zhang, Lin, Zhong, Zhou, Li, and Yu at the Guangzhou National Laboratory and released as a February 2026 bioRxiv preprint.
DERIVE belongs to an emerging class of evolution-aware viral foundation models. Its emphasis on disentangled, multimodal representations and cross-family transfer distinguishes it from virus-specific antigenic prediction methods that must be retrained for each pathogen.
DERIVE learns a disentangled latent representation by jointly modeling sequence homology together with physicochemical and structural features of viral proteins, using a flow-based generative framework to capture the distribution of antigenic change. The disentanglement is what enables cross-virus predictive modeling: by factoring the latent space into evolutionary and biophysical components, the model can apply patterns learned on some viruses to others. The preprint reports zero-shot generalization to four viral families, indicating transfer beyond the training pathogens. Full architectural specifications, training datasets, parameter counts, and quantitative benchmark results are detailed in the paper, which is released under a CC BY license. As a recent preprint, the availability of code and trained weights should be verified from the authors.
DERIVE is intended for viral surveillance and vaccine development, where predicting antigenic change helps anticipate immune escape and guide strain selection. Its cross-family generalization makes it especially relevant for emerging or under-studied pathogens for which limited antigenic data exist, since the model can transfer knowledge from better-characterized viruses. Researchers tracking the evolution of respiratory and other rapidly evolving viruses could use DERIVE to prioritize variants of concern and to interpret which sequence, physicochemical, or structural changes are driving antigenic shifts.
DERIVE contributes a disentangled, multimodal approach to a problem usually tackled with virus-specific models, and its reported zero-shot transfer across four viral families suggests a path toward more general antigenic-evolution forecasting. Coming from the Guangzhou National Laboratory, which focuses on respiratory and infectious disease, the work targets a problem of clear public-health relevance. As a February 2026 preprint, its results have not yet been independently validated, and the practical reliability of cross-family predictions — particularly for viruses very different from those in training — will require further external evaluation.