Zhejiang University School of Medicine
A multi-modal Transformer that fuses LLM gene embeddings with biological knowledge graphs to predict single-cell transcriptomic responses to genetic perturbations.
Predicting how a cell's transcriptome will change when a gene is knocked out, knocked down, or activated is a central problem in functional genomics and a key building block for the "virtual cell" — a computational model that learns the relationship between cell state and function well enough to forecast the consequences of perturbations across diverse contexts. Pooled CRISPR screens such as Perturb-seq link genetic perturbations to single-cell RNA-seq readouts at scale, but experimental approaches remain limited in coverage and cost and cannot exhaustively map every single-gene and combinatorial intervention. Computational methods that generalize from observed perturbations to unseen ones are therefore valuable for hypothesis generation and experimental prioritization.
scPert, developed by Lu and colleagues in the Department of Pharmacy at the Second Affiliated Hospital, Zhejiang University School of Medicine, and posted to bioRxiv in April 2026, addresses three persistent weaknesses of existing perturbation-prediction methods: limited accuracy on complex genetic interactions, poor biological interpretability, and inadequate generalization to genes not seen during training. It belongs to the same family as knowledge-augmented predictors such as GEARS and scGenePT, which inject curated biological knowledge alongside expression data, but differs in how it combines those signals.
The core idea is a multi-modal Transformer that integrates large language model embeddings with structured biological knowledge through a hierarchical fusion of three representation types: knowledge graph representations, contextual embeddings from foundation models, and gene-specific encodings. By grounding expression-based learning in both unstructured textual knowledge about gene function and structured relational knowledge, scPert aims to improve prediction of both single-gene and combinatorial perturbation effects while remaining biologically interpretable.
Hierarchical multi-modal fusion: scPert combines knowledge graph representations, contextual embeddings from foundation models, and gene-specific encodings in a layered fusion scheme rather than a single concatenation, so each gene is represented by complementary structured and unstructured signals.
LLM-derived gene embeddings: Large language model embeddings supply prior functional knowledge about each gene, providing gene-specific signal even when a perturbed gene was not observed individually during training — the regime where purely data-driven models struggle most.
Knowledge graph grounding: Structured biological knowledge encoded as a graph supplies relational context (functional and pathway relationships among genes) that is not recoverable from co-expression statistics alone, improving interpretability of predicted interactions.
Combinatorial perturbation modeling: The framework explicitly targets non-additive effects of multi-gene perturbations, where the combined effect of two interventions differs from the sum of their individual effects.
Generalization across screen scales: scPert was evaluated from small-scale screens (Dixit) to genome-wide screens (the Replogle datasets), with the authors reporting consistent performance across these very different scales.
scPert is a Transformer-based architecture that predicts single-cell transcriptomic responses to genetic perturbations. Each gene's representation is assembled hierarchically from a knowledge graph embedding, a contextual embedding drawn from foundation models, and a gene-specific encoding, and the fused representation is used to predict post-perturbation expression profiles. This design lets the model leverage curated biological knowledge (textual and relational) alongside the statistical patterns learned from expression data.
The authors benchmarked scPert against established methods including GEARS, scGenePT variants, and Scouter across standard Perturb-seq datasets, reporting prediction error on differentially expressed genes. scPert achieved the lowest reported error on the Norman (0.190) and Replogle RPE1 (0.122) datasets, and near-best performance on Replogle K562 (0.123, versus 0.122 for Scouter). On Adamson (0.151) and Dixit (0.091) it remained competitive with the strongest baselines while maintaining balanced correlation performance. In cancer-relevant analyses, the authors report that scPert recovers p53 pathway dynamics and immune checkpoint regulatory mechanisms, and they evaluate it systematically across 42 cancer dependency genes to identify candidate therapeutic targets. A parameter count is not reported in the preprint, and no public code release or model/data card was located at the time of writing.
scPert is intended for researchers designing or interpreting genetic perturbation screens who want to computationally forecast transcriptional responses before committing experimental resources, or to extrapolate from a partial screen to perturbations that were not assayed. Because it incorporates knowledge graphs and language-model priors, it is positioned to predict effects for genes and gene pairs absent from the training data, supporting target prioritization in drug discovery. The authors emphasize cancer applications in particular, using the model to probe p53 dynamics and immune checkpoint regulation and to nominate potential therapeutic targets from a panel of 42 cancer dependency genes — a workflow relevant to functional genomics groups and translational oncology pipelines.
scPert contributes to the rapidly growing line of knowledge-augmented single-cell perturbation models, alongside GEARS and scGenePT, by proposing a hierarchical fusion that couples large language model embeddings with structured knowledge graphs rather than treating either modality in isolation. Its reported gains on combinatorial perturbations and on unseen genes, together with strong results across screen scales from Dixit to the genome-wide Replogle datasets, advance the case that structured and unstructured biological knowledge add value beyond expression data alone for virtual-cell construction. The main caveats are that these results come from a preprint that has not yet been peer-reviewed, that several reported margins over the best baselines (for example on Replogle K562, Adamson, and Dixit) are narrow, and that no public code, model card, or data card has been located — limiting independent reproduction and broad adoption until those become available.