SNAP (Stanford)
Geometric deep learning model that predicts transcriptional responses to multi-gene perturbations by integrating single-cell RNA-seq with a gene-gene knowledge graph.
High-throughput CRISPR perturbation screens have made it possible to systematically perturb individual genes and measure the transcriptional consequences in single cells. But the combinatorial explosion of multi-gene perturbations poses a fundamental limitation: even a screen of 200 genes generates roughly 20,000 possible two-gene combinations, and three-way perturbations number in the millions. Experimentally exhausting even a fraction of this space is impractical. GEARS (short for graph-enhanced gene activation and repression simulator) addresses this challenge by learning to predict transcriptional responses to multi-gene perturbations, including combinations of genes that were never experimentally perturbed together and combinations involving genes never perturbed at all.
GEARS was developed by Yusuf Roohani, Kexin Huang, and Jure Leskovec at the Stanford SNAP (Stanford Network Analysis Project) lab and published in Nature Biotechnology in August 2023. The model's central innovation is the integration of geometric deep learning — specifically graph neural networks operating over a knowledge graph of gene-gene relationships — with expression data from perturbational single-cell RNA-seq screens. By learning gene representations that encode network context, GEARS can make predictions for entirely novel perturbation targets that have known biological relationships to genes the model was trained on, enabling genuine out-of-distribution generalization rather than interpolation.
This capability to predict outcomes for genes that were never individually perturbed in the training screen is unique among perturbation prediction methods. Previous approaches, including linear models and autoencoder-based methods, are fundamentally limited to the genes present in the training data. GEARS overcomes this constraint by grounding predictions in a rich biological knowledge graph that encodes gene regulatory and co-expression relationships, allowing the model to reason about the likely effects of perturbing a gene based on its network neighborhood even when direct experimental data is absent.
GEARS uses a two-level graph neural network (GNN) architecture. At the first level, a GNN operates over the biological gene-gene relationship graph to produce gene embeddings that capture network context; these embeddings are precomputed from a knowledge graph combining the STRING protein-protein interaction network, KEGG pathway co-membership, and gene co-expression data. At the second level, a perturbation-conditioned GNN takes the gene embeddings of the perturbed genes and propagates information through the same network structure to predict the post-perturbation expression profile across all genes. This two-stage design allows the model to reason about both the identity of perturbed genes (via their network embeddings) and the propagation of perturbation effects through the gene regulatory network (via message passing).
The model was trained and evaluated primarily on the Norman et al. 2019 dataset, a large-scale combinatorial CRISPRa screen in K562 cells that profiled approximately 105,000 cells across 287 perturbation conditions including 131 single-gene and 131 two-gene combinations. GEARS achieved 40% higher precision than existing approaches in predicting four distinct genetic interaction subtypes and identified the strongest interactions more than twice as well as prior methods. A key evaluation protocol used in the paper tests whether GEARS can predict the outcome of two-gene perturbations where neither gene was ever perturbed individually during training — a particularly challenging out-of-distribution generalization task that existing methods cannot address at all. The model also generalizes across cell lines and perturbation modalities.
GEARS is designed for researchers running or planning CRISPR perturbation screens who want to extend the informativeness of their experiments beyond what was directly measured. In practice, GEARS can be used in two modes: first, as a prediction tool applied before a screen to prioritize which perturbation combinations are most likely to yield novel or interesting genetic interactions; second, as an imputation tool applied after a screen to computationally fill in the expression profiles for perturbation conditions that were not experimentally covered. Both modes reduce the experimental cost of functional genomics studies. GEARS is also directly applicable to the analysis of existing large-scale perturbation datasets from Perturb-seq and similar technologies, where the sheer number of conditions measured makes computational synthesis of gene-gene interaction patterns essential. The model is used in drug target discovery workflows to identify genetic vulnerabilities and synergistic target combinations.
GEARS represents a conceptual advance in perturbation prediction by demonstrating that grounding predictions in biological knowledge graphs enables genuine out-of-distribution generalization — not just interpolation between training conditions, but accurate prediction for genes and combinations entirely absent from the training data. Published in Nature Biotechnology, the work has attracted considerable attention in the computational genomics community and has been widely adopted as a benchmark for subsequent perturbation prediction methods. The Stanford SNAP group's reputation for high-impact graph machine learning work has amplified GEARS's visibility across both the computational biology and ML communities. The model's performance on multi-gene perturbations also contributed to growing recognition that genetic interactions cannot be reliably predicted by summing single-gene effects, motivating increased experimental and computational investment in combinatorial perturbation studies. GEARS is openly available with tutorials and a well-maintained GitHub repository, facilitating adoption in both academic and industrial research settings.
Sources: