Overview

ProtiCelli is a deep generative model that simulates fluorescence microscopy images at single-cell resolution for all 12,800 human proteins covered by the Human Protein Atlas (HPA), trained on three landmark cell stains (nucleus, microtubules, endoplasmic reticulum) plus the protein of interest. Posted to bioRxiv in late March 2026 by HPA and KTH Royal Institute of Technology, ProtiCelli generated 30.7 million synthetic single-cell images that have been integrated into HPA version 26 alongside experimentally measured images.

ProtiCelli is the first proteome-scale generative imaging model, enabling researchers to query and simulate subcellular localization patterns for any human protein, including proteins for which experimental imaging is sparse or absent.

Key Features

Proteome-wide coverage: Generates images for all 12,800 proteins represented in the Human Protein Atlas, the broadest protein-imaging coverage of any model to date.
Three-stain conditioning: Conditions on nucleus, microtubule, and ER stains to provide a consistent cellular context across protein-of-interest channels.
Single-cell resolution: Generates images at the single-cell level rather than at population averages, supporting heterogeneity analysis.
HPA v26 integration: 30.7M generated images are released as part of HPA v26, providing a curated, queryable resource for the wider research community.
Drug-perturbation simulation: Supports counterfactual generation of imaging outcomes under perturbations without requiring corresponding wet-lab experiments.

Technical Details

ProtiCelli uses a diffusion-based image generation backbone trained on millions of HPA fluorescence images, conditioned on a learned protein-identity embedding and three reference-stain channels. The protein embedding is derived from sequence-based features, enabling generation for proteins without prior imaging. Architectural details, training schedule, and held-out evaluation are reported in the bioRxiv preprint.

Quantitative evaluation includes subcellular-localization classifier accuracy on synthetic vs. real images, FID metrics, and human-expert ratings of biological plausibility. Generated images integrated into HPA v26 are tagged as model-generated for transparency.

Applications

ProtiCelli is useful for cell-biology workflows that require subcellular-localization information for proteins lacking sufficient experimental imaging coverage. Drug-discovery teams can query simulated imaging outcomes to prioritize candidates for downstream high-content screening. The drug-perturbation simulation capability extends the use of in silico cell models into the imaging modality, complementing transcriptomic virtual-cell models such as scGPT and X-Cell.

Impact

ProtiCelli is the first generative imaging model to operate at proteome scale, enabling synthetic-image generation as a routine step in cell biology research. Its integration into HPA v26 ensures wide accessibility and provides community-level validation. The work also raises methodological questions about the appropriate handling and labeling of model-generated biological data when integrated into authoritative reference resources.

Overview

Key Features

Proteome-wide coverage: Generates images for all 12,800 proteins represented in the Human Protein Atlas, the broadest protein-imaging coverage of any model to date.

Three-stain conditioning: Conditions on nucleus, microtubule, and ER stains to provide a consistent cellular context across protein-of-interest channels.

Single-cell resolution: Generates images at the single-cell level rather than at population averages, supporting heterogeneity analysis.

HPA v26 integration: 30.7M generated images are released as part of HPA v26, providing a curated, queryable resource for the wider research community.

Drug-perturbation simulation: Supports counterfactual generation of imaging outcomes under perturbations without requiring corresponding wet-lab experiments.

Technical Details

Applications

Impact

ProtiCelli

Overview

Key Features

Technical Details

Applications

Impact

Citation

Generative machine learning unlocks the first proteome-wide image of human cells

Metrics

Citations

Tags

Resources

ProtiCelli

Overview

Key Features

Technical Details

Applications

Impact

Citation

Generative machine learning unlocks the first proteome-wide image of human cells

Metrics

Citations

Tags

Resources