Technical University of Munich
A VAE trained on scRNA-seq reference data and applied frozen at inference to impute unmeasured genes and denoise spatial transcriptomics profiles.
Imaging-based spatial transcriptomics platforms (such as MERFISH, Xenium, and CosMx) resolve gene expression in situ while preserving the spatial organization of tissue, but they do so over a limited, pre-selected gene panel and with substantial technical noise. Dissociated single-cell RNA sequencing (scRNA-seq), by contrast, captures the full transcriptome at high depth but discards spatial context. Bridging these two modalities — recovering transcriptome-wide expression for spatially resolved cells — is a central challenge for spatial biology.
Cellpin, developed by researchers at the Technical University of Munich (combining the Theis and Saur labs) and posted to bioRxiv in June 2026, addresses this problem with a deliberately simple recipe: a variational autoencoder (VAE) is trained exclusively on scRNA-seq reference data, and then applied at inference from a single fixed checkpoint to spatial transcriptomics data — with no per-dataset retraining or fine-tuning. From that frozen model, Cellpin imputes genes absent from the spatial panel and denoises the measured profiles, transferring the depth and breadth of scRNA-seq onto spatial measurements.
This frozen-checkpoint, reference-only design separates Cellpin from the larger family of spatial imputation methods, many of which require fitting a model to each paired reference/spatial dataset. By decoupling training from application, Cellpin aims to make transcriptome-wide imputation reusable across datasets without bespoke optimization for every new sample.
Cellpin is a variational autoencoder, a generative latent-variable architecture that encodes expression profiles into a probabilistic latent space and decodes reconstructions from it. The model is trained only on scRNA-seq reference data, where it learns the joint distribution of full-transcriptome expression; at inference it is applied as a fixed checkpoint to spatial transcriptomics data to impute unmeasured genes and denoise the measured signal, without any dataset-specific retraining. In the accompanying preprint, the authors benchmark Cellpin against six existing imputation methods across paired datasets that pair a scRNA-seq reference with a matched spatial sample, assessing both imputation accuracy and denoising quality.
Cellpin is intended for researchers working with imaging-based spatial transcriptomics who want transcriptome-wide expression rather than the limited gene panels these platforms measure. By imputing missing genes and denoising measured ones from a reusable frozen model, it supports downstream tasks such as cell-type characterization, identification of spatially variable genes, and analyses of tissue microenvironments — for example in tumor biology, a focus of the contributing Saur lab — without the overhead of retraining a model for each new tissue sample.
Cellpin contributes to the rapidly growing toolkit for integrating scRNA-seq references with spatial transcriptomics, an area where methods such as gimVI, Tangram, SpaGE, and others have set strong baselines. Its distinguishing proposition is practical: a model trained once on reference data and reused frozen across spatial datasets, evaluated against six methods on paired benchmarks. As a preprint released in June 2026, its longer-term adoption and independent validation remain to be established, and at the time of writing no public code repository or pretrained weights had been released, which currently limits direct reproduction and reuse by the community.
Putze, P., et al. (2026) Cellpin enables reference-based imputation and denoising of spatial transcriptomes. bioRxiv.
DOI: 10.64898/2026.06.02.729566Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data