Overview

Cell morphology and subcellular protein organization are fundamental readouts of cellular state, yet extracting quantitative biological meaning from fluorescence microscopy images at scale has remained difficult. SubCell addresses this gap by training Vision Transformer models on the Human Protein Atlas (HPA) image collection — the largest publicly available proteome-wide fluorescence microscopy dataset — using a novel proteome-aware self-supervised learning objective that requires no manual annotation.

The central innovation is a multi-task pretraining framework that simultaneously exploits three complementary learning signals: masked image reconstruction, cell-specific consistency (learning what is stable across a given cell regardless of which protein is stained), and protein-specific consistency (learning what is stable across all images of a given protein regardless of which cell it is in). This design encourages the model to disentangle general cell morphology from protein-specific localization patterns, producing representations that encode rich biological structure beyond anything explicitly annotated in the training set.

SubCell was developed collaboratively by the Chan Zuckerberg Initiative, the Human Protein Atlas project, and the Lundberg Lab at Stanford University, and was released as a bioRxiv preprint in December 2024. The model demonstrates strong zero-shot generalization across external fluorescence microscopy datasets with different imaging devices, magnifications, and staining protocols.

Key Features

Proteome-aware pretraining: A novel multi-task self-supervised objective combines reconstruction, cell-level, and protein-level learning signals drawn from 13,000+ proteins imaged across 37 human cell lines, enabling the model to separate morphological context from protein-specific localization.
Zero-shot generalization: Outperforms prior supervised and self-supervised baselines on held-out HPA data and external datasets without any fine-tuning, including datasets acquired on different microscopes and protocols.
Proteome-wide cell map: Enables construction of the first hierarchical map of the human proteome derived entirely from image data, resolving protein complexes, functional modules, and dynamic versus stable subcellular behaviors.
Multimodal integration: SubCell embeddings combined with protein sequence models (such as ESM) outperform either modality alone on gene function prediction tasks, enabling cross-modal biological inference.
Open weights and code: Model weights are publicly available via AWS S3 and the training and inference codebase is open-source under a permissive license.

Technical Details

SubCell is built on a ViT-B/16 (Vision Transformer Base with 16x16 patch size) backbone. The pretraining framework applies three objectives to the same shared encoder: a masked autoencoder (MAE)-style reconstruction task that learns general image structure, a cell-specific consistency objective that enforces invariance to protein identity within a single cell, and a protein-specific consistency objective that enforces invariance to cell identity across all images of a given protein. This decomposition lets the model encode cellular context and protein-specific localization in the same embedding space in a way that neither objective alone achieves.

Training used approximately 1.1 million single-cell image crops derived from the Human Protein Atlas, covering 13,000+ protein-coding genes across 37 human cell lines. Images are four-channel immunofluorescence confocal micrographs staining for the nucleus (DAPI), microtubules (tubulin), endoplasmic reticulum (calreticulin), and the target protein. Preprocessing standardized all images to single-cell crops so that the model learns cell-level rather than field-of-view-level statistics. Downstream benchmarks span protein subcellular localization classification, cellular phenotyping, mechanism-of-action prediction in perturbation screens (RXRX1, JUMP-CP), and cross-dataset generalization — all evaluated in a zero-shot regime.

Applications

SubCell is primarily useful in cell biology and high-content imaging workflows. Researchers studying protein localization can use SubCell embeddings to predict where a protein resides in the cell from image data, cluster proteins by localization pattern, or identify proteins with condition-dependent distributions. In cell phenotyping, the morphological signals captured by SubCell support discrimination of cell types, tracking of differentiation states, and detection of cells responding to genetic or chemical perturbations. In drug discovery, SubCell can cluster treatment conditions in high-content imaging screens by shared morphological profiles, surfacing compounds with similar mechanisms of action without labeled training data — directly applicable to large-scale libraries such as JUMP-CP. The proteome-wide cell map generated from SubCell embeddings also provides a complementary resource to network- and interactome-based protein atlases for systems biology research.

Impact

SubCell represents a significant advance in self-supervised learning for biological imaging, demonstrating that a carefully designed pretraining objective can produce representations that generalize broadly across imaging contexts without supervision. As part of the CZI Virtual Cells platform, it is positioned as a foundational component for cell biology AI infrastructure. Key limitations include its restriction to HPA-style four-channel immunofluorescence images — performance on other modalities such as brightfield or electron microscopy has not been characterized — and its dependence on accurate cell segmentation for generating single-cell crops. Generalization to primary cells, non-human organisms, or tissue sections may require additional evaluation. As of December 2024, SubCell remains a preprint and has not yet undergone formal peer review.

Overview

Key Features

Proteome-aware pretraining: A novel multi-task self-supervised objective combines reconstruction, cell-level, and protein-level learning signals drawn from 13,000+ proteins imaged across 37 human cell lines, enabling the model to separate morphological context from protein-specific localization.

Zero-shot generalization: Outperforms prior supervised and self-supervised baselines on held-out HPA data and external datasets without any fine-tuning, including datasets acquired on different microscopes and protocols.

Proteome-wide cell map: Enables construction of the first hierarchical map of the human proteome derived entirely from image data, resolving protein complexes, functional modules, and dynamic versus stable subcellular behaviors.

Multimodal integration: SubCell embeddings combined with protein sequence models (such as ESM) outperform either modality alone on gene function prediction tasks, enabling cross-modal biological inference.

Open weights and code: Model weights are publicly available via AWS S3 and the training and inference codebase is open-source under a permissive license.

Technical Details

Applications

Impact

SubCell

Overview

Key Features

Technical Details

Applications

Impact

Citation

SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology

Metrics

GitHub

Citations

Tags

Resources

SubCell

Overview

Key Features

Technical Details

Applications

Impact

Citation

SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology

Metrics

GitHub

Citations

Tags

Resources