Overview

scLong is a billion-parameter single-cell foundation model published in Nature Communications in 2026 that performs full self-attention across all approximately 28,000 protein-coding human genes, removing the gene-selection step that prior single-cell foundation models such as scGPT and Geneformer rely on. The model integrates Gene Ontology (GO) knowledge through a graph convolutional network whose embeddings are concatenated to gene tokens, providing biological priors that complement the data-driven attention signal.

scLong is the first single-cell foundation model to operate over the complete human transcriptome at this scale and demonstrates SOTA performance on perturbation response prediction, cancer drug response, cell-type annotation, and batch integration.

Key Features

Full-transcriptome attention: Attends over all approximately 28,000 protein-coding human genes per cell, removing the gene-selection step required by scGPT, Geneformer, and scFoundation.
Gene Ontology integration: GO priors injected via GCN-derived gene embeddings concatenated to learned tokens, supplementing data-driven signal with curated knowledge.
Billion-parameter scale: One of the largest single-cell FMs to date.
Strong perturbation prediction: Outperforms prior single-cell FMs on held-out perturbation prediction benchmarks.
Cancer drug response transfer: Effective for predicting cellular response to anti-cancer drugs in zero-shot and fine-tuned settings.

Technical Details

scLong uses a transformer architecture with sparse-attention adaptations to manage the cost of full-transcriptome attention. Each gene token is augmented with a GO-derived embedding produced by a GCN trained on the GO biological-process hierarchy. The model is pretrained with masked-gene prediction on a large pan-tissue scRNA-seq corpus. The published paper reports architecture, training corpus, ablations, and benchmark comparisons against scGPT, Geneformer, scFoundation, and scBERT.

Applications

scLong is suited for translational single-cell research groups working on perturbation response, drug response, and cell-type annotation in heterogeneous tissues. The full-transcriptome attention is particularly valuable for studies where pathway-level effects are expected and where pre-selected gene lists may miss relevant signal.

Impact

scLong demonstrates that scaling single-cell foundation models to full-transcriptome attention is technically feasible and delivers measurable gains over the prior generation of FMs that operate on selected gene subsets. The integration of curated biological knowledge through GO-derived embeddings provides a useful template for combining data-driven and knowledge-driven signal in single-cell modeling.

Citation

scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics

Bai, D., et al. (2026) scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics. Nature Communications.

DOI: 10.1038/s41467-026-69102-y

Overview

Key Features

Full-transcriptome attention: Attends over all approximately 28,000 protein-coding human genes per cell, removing the gene-selection step required by scGPT, Geneformer, and scFoundation.

Gene Ontology integration: GO priors injected via GCN-derived gene embeddings concatenated to learned tokens, supplementing data-driven signal with curated knowledge.

Billion-parameter scale: One of the largest single-cell FMs to date.

Strong perturbation prediction: Outperforms prior single-cell FMs on held-out perturbation prediction benchmarks.

Cancer drug response transfer: Effective for predicting cellular response to anti-cancer drugs in zero-shot and fine-tuned settings.

Technical Details

Applications

Impact

Citation

scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics

Bai, D., et al. (2026) scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics. Nature Communications.

DOI: 10.1038/s41467-026-69102-y

scLong

Overview

Key Features

Technical Details

Applications

Impact

Citation

scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics

Metrics

Citations

Tags

Resources

scLong

Overview

Key Features

Technical Details

Applications

Impact

Citation

scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics

Metrics

Citations

Tags

Resources