Question 1

What is a single-cell foundation model?

Accepted Answer

A single-cell foundation model is a neural network pretrained on large collections of single-cell transcriptomic data — typically RNA-seq profiles measuring the expression of thousands of genes in individual cells. Pretraining allows the model to learn gene co-expression programs that generalize across cell types, tissues, and conditions, enabling transfer to downstream tasks like cell type annotation and perturbation prediction. Prominent examples include Geneformer, scGPT, and scFoundation.

Question 2

How do single-cell models handle batch effects?

Accepted Answer

Foundation models pretrained on diverse, multi-dataset corpora can learn representations that are partially robust to technical batch effects by seeing the same cell types processed across many different protocols. Some architectures explicitly incorporate batch or technology labels as conditioning inputs during pretraining or fine-tuning. Benchmarks like SCIB (single-cell integration benchmarking) measure how well embeddings mix cells of the same type across batches, and these scores are increasingly reported alongside biological conservation metrics.

Question 3

Can single-cell foundation models predict drug responses?

Accepted Answer

Perturbation prediction is an active and competitive benchmark in the field. Models trained on large genetic perturbation screens — such as Perturb-seq data — can predict expression changes for unseen gene knockouts or combinations with modest accuracy. Predicting drug responses is harder than predicting single-gene knockouts due to the complexity of drug mechanisms, and current models generalize better to perturbations covered by training data than to truly novel compounds or targets.

Question 4

What data is needed to fine-tune a single-cell foundation model?

Accepted Answer

Fine-tuning typically requires labeled single-cell RNA-seq data specific to your tissue or experimental context, with cell type annotations or perturbation outcomes as supervision. Most published single-cell foundation models were pretrained on large public atlases like CELLxGENE and can be fine-tuned with a few thousand to tens of thousands of cells for annotation tasks — far less than training from scratch. Perturbation prediction tasks generally benefit from dedicated perturbation screens rather than atlas-derived data alone.

Single-cell Models

What single-cell foundation models do

Applications: annotation, perturbation prediction, and atlas integration

Notable Models

scVI (CELLxGENE Census)

Geneformer

Tahoe-x1

scGPT

Nicheformer

Tahoe-100M-SCVI

Frequently asked questions

What is a single-cell foundation model?

How do single-cell models handle batch effects?

Can single-cell foundation models predict drug responses?

What data is needed to fine-tune a single-cell foundation model?

Explore related categories