Human Protein Atlas / KTH Royal Institute of Technology
Deep generative model simulating fluorescence microscopy images for all 12,800 human proteins across three landmark stains, providing proteome-wide virtual cell imaging at single-cell resolution.
ProtiCelli is a deep generative model that simulates fluorescence microscopy images at single-cell resolution for all 12,800 human proteins covered by the Human Protein Atlas (HPA), trained on three landmark cell stains (nucleus, microtubules, endoplasmic reticulum) plus the protein of interest. Posted to bioRxiv in late March 2026 by HPA and KTH Royal Institute of Technology, ProtiCelli generated 30.7 million synthetic single-cell images that have been integrated into HPA version 26 alongside experimentally measured images.
ProtiCelli is the first proteome-scale generative imaging model, enabling researchers to query and simulate subcellular localization patterns for any human protein, including proteins for which experimental imaging is sparse or absent.
ProtiCelli uses a diffusion-based image generation backbone trained on millions of HPA fluorescence images, conditioned on a learned protein-identity embedding and three reference-stain channels. The protein embedding is derived from sequence-based features, enabling generation for proteins without prior imaging. Architectural details, training schedule, and held-out evaluation are reported in the bioRxiv preprint.
Quantitative evaluation includes subcellular-localization classifier accuracy on synthetic vs. real images, FID metrics, and human-expert ratings of biological plausibility. Generated images integrated into HPA v26 are tagged as model-generated for transparency.
ProtiCelli is useful for cell-biology workflows that require subcellular-localization information for proteins lacking sufficient experimental imaging coverage. Drug-discovery teams can query simulated imaging outcomes to prioritize candidates for downstream high-content screening. The drug-perturbation simulation capability extends the use of in silico cell models into the imaging modality, complementing transcriptomic virtual-cell models such as scGPT and X-Cell.
ProtiCelli is the first generative imaging model to operate at proteome scale, enabling synthetic-image generation as a routine step in cell biology research. Its integration into HPA v26 ensures wide accessibility and provides community-level validation. The work also raises methodological questions about the appropriate handling and labeling of model-generated biological data when integrated into authoritative reference resources.
Sun, H., et al. (2026) Generative machine learning unlocks the first proteome-wide image of human cells. bioRxiv.
DOI: 10.64898/2026.03.31.715748