
Single-cell Models
Single-cell transcriptomics and genomics
97 models in this category
What single-cell foundation models do
Single-cell foundation models are pretrained on millions of single-cell transcriptomic profiles, learning gene expression programs that encode cell identity, state, developmental trajectory, and response to perturbation. By training on the high-dimensional co-expression relationships between genes in individual cells, these models build representations that transfer across tissues, species, and experimental platforms without requiring manual feature selection. Models like Geneformer, scGPT, and scFoundation have demonstrated that pretraining at scale on single-cell atlases produces embeddings with broad generalization.
Applications: annotation, perturbation prediction, and atlas integration
Cell type annotation is the most widely adopted application: foundation model embeddings consistently match or exceed supervised classifiers trained from scratch, particularly for rare cell types with few annotated examples. Perturbation response prediction — forecasting how a cell's expression profile changes after a gene knockout or drug treatment — is a more demanding benchmark where models like scGPT and GEARS have been directly compared. Cross-dataset integration and the inference of developmental trajectories are additional use cases driving adoption as single-cell atlases grow to billions of cells.
Notable Models
Top-rated single-cell models from our evaluations
A variational autoencoder pretrained on 74 million human single-cell transcriptomes from the CELLxGENE Census for scalable batch correction, cell typing, and data integration.
Transformer-based foundation model pretrained on ~30 million single-cell transcriptomes for context-aware gene network predictions and therapeutic target discovery.
Single-cell foundation model pre-trained on 50 million cells for gene network inference, denoising, and cell type prediction.
A generative pre-trained transformer for single-cell multi-omics, pretrained on 33 million human cells for cell annotation, batch correction, and perturbation prediction.
GPT-based generative model pre-trained on 22 million single-cell transcriptomes using rank-based gene encoding for single-cell clustering, trajectory inference, and bulk tumor analysis.
A single-cell perturbation model that augments scGPT with gene-level language embeddings from NCBI, UniProt, and Gene Ontology to improve multi-gene perturbation prediction.
Frequently asked questions
What is a single-cell foundation model?
A single-cell foundation model is a neural network pretrained on large collections of single-cell transcriptomic data — typically RNA-seq profiles measuring the expression of thousands of genes in individual cells. Pretraining allows the model to learn gene co-expression programs that generalize across cell types, tissues, and conditions, enabling transfer to downstream tasks like cell type annotation and perturbation prediction. Prominent examples include Geneformer, scGPT, and scFoundation.
How do single-cell models handle batch effects?
Foundation models pretrained on diverse, multi-dataset corpora can learn representations that are partially robust to technical batch effects by seeing the same cell types processed across many different protocols. Some architectures explicitly incorporate batch or technology labels as conditioning inputs during pretraining or fine-tuning. Benchmarks like SCIB (single-cell integration benchmarking) measure how well embeddings mix cells of the same type across batches, and these scores are increasingly reported alongside biological conservation metrics.
Can single-cell foundation models predict drug responses?
Perturbation prediction is an active and competitive benchmark in the field. Models trained on large genetic perturbation screens — such as Perturb-seq data — can predict expression changes for unseen gene knockouts or combinations with modest accuracy. Predicting drug responses is harder than predicting single-gene knockouts due to the complexity of drug mechanisms, and current models generalize better to perturbations covered by training data than to truly novel compounds or targets.
What data is needed to fine-tune a single-cell foundation model?
Fine-tuning typically requires labeled single-cell RNA-seq data specific to your tissue or experimental context, with cell type annotations or perturbation outcomes as supervision. Most published single-cell foundation models were pretrained on large public atlases like CELLxGENE and can be fine-tuned with a few thousand to tens of thousands of cells for annotation tasks — far less than training from scratch. Perturbation prediction tasks generally benefit from dedicated perturbation screens rather than atlas-derived data alone.