Overview

PINNACLE (Protein Interaction Networks with Context-Aware Learning Embeddings) is a geometric deep learning framework developed at the Zitnik Lab, Harvard Medical School, that generates context-aware protein representations at single-cell resolution. Published in Nature Methods in August 2024, it addresses a fundamental limitation of conventional protein representation models: they assign a single embedding to each protein regardless of the cellular context in which that protein is expressed. Because the same protein can participate in very different interaction networks and regulatory programs depending on the cell type and tissue, context-free representations conflate biologically distinct states.

PINNACLE resolves this by integrating three complementary data sources — a global protein interaction network, cell-type-specific protein interaction networks derived from single-cell transcriptomics, and a tissue hierarchy metagraph — to produce a unique embedding for each protein in each cell type where it is active. Trained on a multi-organ single-cell atlas spanning 24 human tissues and organs, PINNACLE generates 394,760 protein representations distributed across 156 cell-type contexts, constituting the largest contextualized protein embedding space of its kind at the time of publication.

The work was led by Michelle M. Li, Yepeng Huang, Marissa Sumathipala, and colleagues including Marinka Zitnik, Alberto Valdeolivas, and collaborators with expertise in rheumatology and gastroenterology at Brigham and Women's Hospital. The model is freely available through GitHub and HuggingFace, and pretrained checkpoints can be fine-tuned for downstream tasks.

Key Features

Cell-type-specific protein representations: Rather than producing one embedding per protein, PINNACLE generates a distinct representation for every cell type in which a protein is expressed, capturing how molecular function varies across biological contexts.
Multi-scale attention mechanisms: The model applies protein-level, cell-type-level, and tissue-level attention mechanisms simultaneously, injecting knowledge of cellular and tissue organization directly into the embedding space.
Zero-shot tissue hierarchy retrieval: The learned embedding space reflects the known hierarchical organization of tissues without any explicit supervision, enabling zero-shot retrieval tasks that benchmark the biological coherence of representations.
Therapeutic target nomination: PINNACLE outperforms context-free baseline models in identifying disease-relevant protein targets for rheumatoid arthritis in at least 18.6% of cell-type contexts and for inflammatory bowel disease in at least 8.6% of contexts.
Fine-tunable for downstream tasks: Pretrained representations can be adapted for diverse applications, including enhancing 3D structure-based protein interaction prediction and modeling the transcriptomic effects of drugs across cell types.
Open access with pretrained checkpoints: Both model weights and the full set of 394,760 contextualized protein embeddings are publicly released, lowering the barrier to adoption for researchers without large compute resources.

Technical Details

PINNACLE is built on geometric deep learning, a family of methods that generalize neural networks to graph-structured data. The model operates on a hierarchical graph construction: cell-type-specific protein interaction networks (derived by weighting a global reference interactome with single-cell gene expression data) are connected via a metagraph encoding cell-type-to-cell-type and cell-type-to-tissue relationships. This nested, multi-resolution graph is the input over which PINNACLE learns.

Training is self-supervised and employs protein-, cell-type-, and tissue-level objective functions to simultaneously encode local protein neighborhood structure and global cellular organization. The multi-scale attention architecture allows information to propagate across levels of biological organization — from individual protein interactions up through cell-type identity and tissue-level programs — in a unified embedding space. Single-cell transcriptomic data from a comprehensive multi-organ human atlas (covering 24 tissues and 156 cell types) provides the expression-based context that shapes cell-type-specific networks. The pretrained model and embeddings are hosted on HuggingFace under the Therapeutics Data Commons (TDC) organization, making them straightforward to load for fine-tuning.

Applications

PINNACLE is designed for researchers working at the intersection of computational protein biology, systems pharmacology, and single-cell genomics. Its primary demonstrated application is therapeutic target identification: by providing cell-type-resolved protein representations, PINNACLE helps prioritize not just which proteins to target but in which cell types intervention is most likely to be effective — a critical consideration for immune-mediated inflammatory diseases such as rheumatoid arthritis and inflammatory bowel disease. Beyond target nomination, the framework supports studies of drug mechanism of action by modeling how a drug's transcriptomic perturbation propagates differently across cell types. Researchers can also use PINNACLE's embeddings to augment structure-based methods, such as docking or protein interaction prediction, with cell-type context that sequence- or structure-only models cannot provide.

Impact

PINNACLE represents a conceptual advance in how the field approaches protein representation learning by demonstrating that cellular context is a first-class feature, not an afterthought. The paper's benchmarks show consistent gains over context-free models on disease-relevant tasks, providing a concrete argument for incorporating single-cell data into protein AI pipelines. The model's release through HuggingFace and its compatibility with the Therapeutics Data Commons ecosystem lower adoption barriers for groups without specialized infrastructure. A key limitation is that PINNACLE's representations are currently anchored to the specific cell types and tissues present in its training atlas; extending coverage to rarer cell types, diseased tissue states, or non-human organisms will require retraining or fine-tuning on new atlases. The framework nonetheless establishes an important template for context-sensitive biological foundation models that subsequent work in this area is likely to build upon.

Overview

Key Features

Cell-type-specific protein representations: Rather than producing one embedding per protein, PINNACLE generates a distinct representation for every cell type in which a protein is expressed, capturing how molecular function varies across biological contexts.

Multi-scale attention mechanisms: The model applies protein-level, cell-type-level, and tissue-level attention mechanisms simultaneously, injecting knowledge of cellular and tissue organization directly into the embedding space.

Zero-shot tissue hierarchy retrieval: The learned embedding space reflects the known hierarchical organization of tissues without any explicit supervision, enabling zero-shot retrieval tasks that benchmark the biological coherence of representations.

Therapeutic target nomination: PINNACLE outperforms context-free baseline models in identifying disease-relevant protein targets for rheumatoid arthritis in at least 18.6% of cell-type contexts and for inflammatory bowel disease in at least 8.6% of contexts.

Fine-tunable for downstream tasks: Pretrained representations can be adapted for diverse applications, including enhancing 3D structure-based protein interaction prediction and modeling the transcriptomic effects of drugs across cell types.

Open access with pretrained checkpoints: Both model weights and the full set of 394,760 contextualized protein embeddings are publicly released, lowering the barrier to adoption for researchers without large compute resources.

Technical Details

Applications

Impact

PINNACLE

Overview

Key Features

Technical Details

Applications

Impact

Citation

Contextual AI models for single-cell protein biology

Metrics

GitHub

Citations

HuggingFace

Tags

Resources

PINNACLE

Overview

Key Features

Technical Details

Applications

Impact

Citation

Contextual AI models for single-cell protein biology

Metrics

GitHub

Citations

HuggingFace

Tags

Resources