Overview

Nicheformer is a transformer-based foundation model developed by the Theis Lab at Helmholtz Munich and the Technical University of Munich. It is the first foundation model to learn jointly from both dissociated single-cell RNA sequencing (scRNA-seq) data and spatially resolved transcriptomics data, enabling the transfer of spatial context information to standard scRNA-seq datasets where spatial coordinates are not experimentally captured.

The central challenge Nicheformer addresses is the disconnect between the scale of dissociated scRNA-seq atlases and the spatial context encoded in tissue microenvironments. Dissociated protocols sacrifice positional information during cell isolation, while spatial technologies retain tissue context but have historically been limited in throughput or gene coverage. Nicheformer bridges these modalities by pretraining a single model on both data types simultaneously, learning representations that capture intrinsic cellular identity alongside the influence of the surrounding niche.

The model was pretrained on SpatialCorpus-110M, a curated dataset of over 110 million cells spanning human and mouse tissues, making it the largest spatially-aware pretraining corpus at the time of its release. The accompanying study, published in Nature Methods in 2025, also introduces a suite of novel spatial downstream benchmarks — spatial composition prediction, spatial density prediction, and niche and region label prediction — to systematically evaluate spatially-aware foundation models.

Key Features

Joint spatial and dissociated pretraining: Trained simultaneously on 57 million dissociated and 53 million spatially resolved cells, allowing the model to learn how spatial niche context shapes gene expression programs across tissues.
Spatial context prediction for scRNA-seq: Infers the spatial niche of dissociated cells lacking coordinate information, effectively predicting where in tissue a cell originated based on its transcriptomic profile.
Dual-species and broad tissue coverage: Pretraining spans both human and mouse transcriptomics data across 73 tissues, providing generalizable representations for diverse biological contexts.
Novel spatial benchmark tasks: Defines spatial composition prediction, spatial density prediction, and niche and region label prediction as rigorous evaluation benchmarks for spatial omics foundation models.
Open weights and code: Pretrained model weights are hosted on HuggingFace and source code is available on GitHub, enabling direct fine-tuning and deployment by the community.

Technical Details

Nicheformer uses a transformer architecture adapted for single-cell gene expression, processing cells as sequences of gene expression values and learning representations that encode both cellular identity and microenvironmental context. Pretraining is performed via a cellular reconstruction objective over SpatialCorpus-110M. For spatial downstream tasks, the pretrained backbone is fine-tuned to decode spatially resolved cellular information, bridging the representational gap between dissociated and in situ data modalities.

SpatialCorpus-110M comprises 57 million dissociated scRNA-seq profiles and 53 million spatially resolved profiles captured using targeted spatial transcriptomics platforms such as 10x Visium. The pretraining corpus covers 73 tissues from human and mouse, representing the largest joint scRNA-seq and spatial transcriptomics training dataset assembled at the time. In benchmarks against scGPT, Geneformer, scVI, and PCA baselines, Nicheformer consistently outperforms competing models on all spatial downstream tasks, demonstrating that incorporating spatially resolved data during pretraining is necessary to capture the complexity of cells within their microenvironments.

Applications

Nicheformer is applicable wherever researchers want to enrich standard scRNA-seq data with spatial context or analyze spatially resolved transcriptomics datasets. Primary use cases include predicting the spatial niche or tissue region of cells in dissociated datasets, classifying cells into spatial domains, estimating local cellular composition of microenvironments, and inferring cell density patterns across tissue sections. Researchers building single-cell atlases can leverage Nicheformer to integrate dissociated cohorts with spatial reference datasets, gaining positional context without re-collecting spatial data.

Impact

Nicheformer establishes a new paradigm for single-cell foundation models by demonstrating that spatial context must be incorporated during pretraining — not only at fine-tuning — to learn biologically meaningful microenvironmental representations. Models trained exclusively on dissociated data, regardless of scale, cannot recapitulate spatial complexity, a finding that has direct implications for how future large-scale pretraining corpora are assembled. The model's public availability on HuggingFace and GitHub lowers the barrier for adoption across the spatial omics community. Key limitations include a reliance on targeted spatial platforms with curated gene panels, meaning adaptation may be required for whole-transcriptome spatial technologies such as MERFISH or Slide-seq; spatial predictions for dissociated cells are probabilistic and may not generalize well to tissues underrepresented in SpatialCorpus-110M; and the current model is restricted to transcriptomics, with multi-modal spatial data (proteomics, chromatin accessibility) not yet incorporated.

Overview

Key Features

Joint spatial and dissociated pretraining: Trained simultaneously on 57 million dissociated and 53 million spatially resolved cells, allowing the model to learn how spatial niche context shapes gene expression programs across tissues.

Spatial context prediction for scRNA-seq: Infers the spatial niche of dissociated cells lacking coordinate information, effectively predicting where in tissue a cell originated based on its transcriptomic profile.

Dual-species and broad tissue coverage: Pretraining spans both human and mouse transcriptomics data across 73 tissues, providing generalizable representations for diverse biological contexts.

Novel spatial benchmark tasks: Defines spatial composition prediction, spatial density prediction, and niche and region label prediction as rigorous evaluation benchmarks for spatial omics foundation models.

Open weights and code: Pretrained model weights are hosted on HuggingFace and source code is available on GitHub, enabling direct fine-tuning and deployment by the community.

Technical Details

Applications

Impact

Nicheformer

Overview

Key Features

Technical Details

Applications

Impact

Citation

Nicheformer: a foundation model for single-cell and spatial omics

Metrics

GitHub

Citations

HuggingFace

Tags

Resources

Nicheformer

Overview

Key Features

Technical Details

Applications

Impact

Citation

Nicheformer: a foundation model for single-cell and spatial omics

Metrics

GitHub

Citations

HuggingFace

Tags

Resources