A knowledge-driven framework that predicts single-cell transcriptomic responses to small molecules, including zero-shot prediction for drugs with no prior perturbation profiles.
Predicting how individual cells respond to a drug is central to mechanistic pharmacology and the prioritization of candidate compounds. Existing single-cell perturbation models learn well for molecules with abundant measured profiles, but they generalize poorly to unprofiled drugs because they treat each compound as an isolated identifier, ignoring the mechanistic relationships that connect drugs, their targets, and downstream genes. MAP, introduced in a February 2026 bioRxiv preprint from Shanghai Jiao Tong University, addresses this gap by injecting structured biological knowledge into the perturbation-modeling problem.
The core idea is to ground drug and gene representations in a large knowledge graph before they are used to predict expression changes. MAP first constructs MAP-KG, a purpose-built knowledge graph for cellular perturbation modeling, then pre-trains multimodal embeddings that align a compound's chemical structure, its target proteins, and textual descriptions of its mechanism of action. These knowledge-grounded embeddings let the model reason about a new drug by analogy to mechanistically related compounds it has seen, enabling zero-shot response prediction for molecules with scarce or absent profiles.
MAP is built around MAP-KG, a knowledge graph assembled from 14 public databases that covers approximately 187k drugs, 23k genes, and 694k mechanistic relationships (such as drug-target and gene-gene interactions). The framework uses a knowledge-driven pre-training stage in which contrastive learning aligns three modalities for each compound—molecular structure, protein-sequence features of its targets, and textual mechanistic descriptions—into a unified embedding space. These embeddings are then used to predict single-cell expression responses, with the graph-derived context allowing the model to extrapolate to drugs absent from the perturbation training set. As a recent preprint, full architectural hyperparameters, the exact benchmark suite, and code/weight availability should be confirmed against the paper; reported emphasis is on improved zero-shot generalization to unprofiled compounds relative to identifier-based baselines.
MAP is aimed at computational pharmacology and early drug discovery, where researchers want to anticipate the transcriptional consequences of a candidate compound before committing to expensive single-cell perturbation experiments. By supporting zero-shot prediction, it is particularly useful for triaging large chemical libraries or novel scaffolds for which no Perturb-seq or sci-Plex data exist, and for generating mechanistic hypotheses that connect a drug to specific gene programs.
MAP reflects a broader shift in single-cell perturbation modeling toward knowledge-informed representations, arguing that mechanistic priors—not just larger expression datasets—are key to generalizing beyond profiled compounds. If its zero-shot gains hold under independent evaluation, the framework could reduce the experimental burden of screening unprofiled drugs. As a February 2026 preprint, its adoption, released resources, and benchmark standing remain to be established through peer review and community use.