An LLM-enhanced variational graph autoencoder pretrained on pan-cancer single-cell networks to predict context-specific protein-protein interactions at single-cell resolution.
Protein-protein interactions (PPIs) are highly context dependent: the same two proteins may associate in one cell state and not in another, and these rewirings are central to how tumors progress and resist therapy. Most reference PPI databases, however, are static and aggregated across tissues, leaving the cell-state-specific networks that drive cancer largely unmapped. Shusi addresses this gap by inferring context-specific protein networks directly at single-cell resolution, turning single-cell transcriptomes into testable hypotheses about which interactions are active in a given cell.
Developed by Jiajun Yu and colleagues at Zhejiang University and released as a bioRxiv preprint in April 2025, Shusi is an LLM-enhanced variational graph autoencoder. It couples a graph neural network over cell-specific interaction graphs with protein representations distilled from a large language model, allowing the model to reason about interactions using both expression context and prior biological knowledge encoded in text.
Shusi is pretrained once on a large pan-cancer corpus and then applied without per-dataset retraining, positioning it as a reusable foundation model for single-cell network biology rather than a one-off classifier fit to a single study.
shusi.pth) and runs directly on new samples, lowering the barrier to
applying the model to fresh datasets.Shusi is built as a variational graph autoencoder. Each cell is represented as a graph whose nodes are genes or proteins; node features integrate expression values with two precomputed embedding maps distilled from a large language model (a gene-level and a sentence/annotation-level embedding). A graph isomorphism network encodes these graphs into a latent space from which the decoder reconstructs edges, framing PPI prediction as a self-supervised link-reconstruction task. The model was pretrained on 71,575 pan-cancer single-cell networks drawn from 23 cancer types, and the released pipeline ships a single pretrained checkpoint plus the two embedding feature maps. The implementation uses PyTorch and PyTorch Geometric, runs on GPU with automatic CPU fallback, and reports benchmark performance via a precision@10,000 metric on masked edges; the preprint does not report a single headline accuracy number, so quantitative comparisons should be read from the paper directly.
Shusi is aimed at cancer biologists and computational researchers who want to move from cell-type catalogs to mechanism. By surfacing the protein interactions that are specifically active within tumor or microenvironmental cell states, it can nominate candidate signaling axes, prioritize potential therapeutic targets, and generate hypotheses about how interaction networks differ between responders and non-responders or across tumor subtypes. Because it consumes standard single-cell expression inputs and runs without retraining, it can be layered onto existing single-cell analysis pipelines as a network-inference step.
Shusi contributes to a growing effort to make protein interaction networks dynamic and cell-state aware rather than static, and it is notable for marrying graph-based single-cell modeling with LLM-derived protein knowledge in a single pretrained framework. As a 2025 preprint, its real-world adoption and independent validation are still emerging, and several practical caveats apply: the work has not yet been peer reviewed, the code repository does not declare a license, and the pretrained weights are distributed via Google Drive rather than a persistent model registry, which may complicate long-term reproducibility. Researchers should treat its predictions as hypotheses for experimental follow-up.
Zhang, T., et al. (2025) Systematic discovery of single-cell protein networks in cancer with Shusi. bioRxiv.
DOI: 10.1101/2025.04.27.649905Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data