bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cellSmall molecule

CellAwareGNN

Vanderbilt University Medical Center

A knowledge-graph foundation model that injects cell-type-specific genetic associations into a biomedical knowledge graph to improve drug indication prediction and repurposing.

Released: February 2026

CellAwareGNN is a knowledge-graph foundation model for drug indication prediction developed by researchers at Vanderbilt University Medical Center and posted to bioRxiv in February 2026. It tackles a persistent gap in computational drug repurposing: biomedical knowledge graphs encode rich relationships between drugs, genes, diseases, and biological processes, but they typically treat genes as context-free entities, ignoring the fact that a gene's relevance to disease is often specific to a particular cell type. By grounding the knowledge graph in single-cell genomics, CellAwareGNN aims to make indication predictions that respect cellular context.

The central contribution is scPrimeKG, an extension of the widely used PrimeKG biomedical knowledge graph. The authors augment PrimeKG with cell-type-specific genetic associations derived from the OneK1K single-cell eQTL dataset, expanding the graph from roughly 8.1 million edges and 129,000 nodes to over 14 million edges and 140,000 nodes. A graph neural network is then pretrained across all relation types in scPrimeKG to learn transferable embeddings of biomedical entities, which are used to score candidate drug–disease indications.

CellAwareGNN sits in the lineage of knowledge-graph models for therapeutics such as TxGNN, but differentiates itself by injecting cell-resolution biology directly into the graph topology rather than relying on bulk or aggregate gene-level relationships. This makes it a single-cell-informed approach to a problem that has historically been addressed with coarser molecular data.

#Key Features

  • Cell-type-aware knowledge graph: scPrimeKG enriches PrimeKG with cell-type-specific genetic associations from the OneK1K cohort, embedding single-cell eQTL signal into the graph structure used for reasoning.
  • Foundation-model pretraining: The GNN is pretrained across all relation types in the knowledge graph, producing general-purpose entity embeddings rather than a narrowly task-specific model.
  • Full-disease coverage: Drug indication prediction is evaluated with explicit coverage of all diseases represented in the knowledge graph, rather than a restricted subset.
  • Strength on autoimmune disease: The cell-type context yields its largest gains for autoimmune indications, where immune-cell-specific genetic associations are especially informative.

#Technical Details

CellAwareGNN is a graph neural network pretrained on scPrimeKG, a knowledge graph of over 14 million edges and roughly 140,000 nodes spanning drugs, genes/proteins, diseases, phenotypes, and other biomedical entities. The graph is built by extending PrimeKG (approximately 8.1 million edges, 129,000 nodes) with cell-type-specific genetic associations from the OneK1K single-cell eQTL resource. On drug indication prediction, the authors report an AUPRC of 0.826, compared with 0.816 for TxGNN-U and 0.799 for TxGNN. For autoimmune diseases specifically, CellAwareGNN reaches an AUPRC of 0.864, a 2.0% improvement over TxGNN-U and 6.0% over TxGNN, indicating that cell-type-resolved genetic context is most beneficial where immune-cell biology dominates disease mechanism.

#Applications

CellAwareGNN is intended for computational drug repurposing and indication expansion, where the goal is to identify previously unrecognized disease indications for existing or candidate drugs. By incorporating cell-type-specific genetic signal, it is particularly suited to immune-mediated and autoimmune conditions, in which the relevant gene–disease relationships are often confined to specific immune cell populations. Pharmacology and translational research teams can use its scored drug–disease links to prioritize candidates for experimental follow-up, while the scPrimeKG resource itself offers a reusable, cell-aware substrate for other knowledge-graph reasoning tasks.

#Impact

CellAwareGNN demonstrates that single-cell genomics can be folded directly into biomedical knowledge graphs to improve therapeutic prediction, offering a concrete recipe—the scPrimeKG construction—for others to build on. Its measured but consistent gains over the established TxGNN baselines, and its larger margins on autoimmune disease, suggest that cell-type context is a meaningful and underused signal for drug repurposing. As a February 2026 preprint, the work has not yet accumulated downstream adoption, and the authors do not report released model weights at the time of posting; independent benchmarking and broader validation across disease areas will help establish how generally the cell-aware approach transfers.

Tags

drug_discoverylink_predictiongraph_neural_networkfoundation_modelrepresentation_learningknowledge_graphtranscriptomics