xTrimoGene is an efficient transformer architecture for representation learning on single-cell RNA sequencing (scRNA-seq) data, developed by BioMap in collaboration with researchers from Tsinghua University and Mohamed bin Zayed University of Artificial Intelligence. It was presented at NeurIPS 2023. The model addresses a fundamental bottleneck in single-cell genomics: publicly available scRNA-seq datasets now exceed 50 million human cell records, each measuring approximately 20,000 genes, creating a data matrix so large that standard transformer architectures face prohibitive memory and compute demands. xTrimoGene resolves this by exploiting a key property of the data — gene expression matrices are inherently sparse, with the majority of genes showing zero or near-zero expression in any given cell.
The central innovation is an asymmetric encoder-decoder design (designated xTrimoGene-alpha) in which the encoder operates only on non-zero, unmasked gene positions — roughly 10% of the full sequence length — while a lightweight Performer-based decoder handles reconstruction across the full gene space. This asymmetry yields a 10-100x reduction in floating-point operations compared to a dense transformer baseline, making large-scale pre-training tractable without sacrificing predictive accuracy. The result is a model that can be trained on data volumes that were previously impractical for transformer-based approaches.
The flagship xTrimoGene-100M model, with approximately 100 million parameters, was pre-trained on a curated corpus of roughly 50 billion effective gene tokens spanning over 50 million human cells. Performance on downstream benchmarks improves consistently with model scale, following a pattern analogous to the scaling laws observed in protein language models and large language models.
xTrimoGene-100M contains approximately 100 million parameters distributed across an asymmetric encoder-decoder stack with a 2:1 layer ratio and a 1.5:1 attention head ratio between encoder and decoder. The encoder is a standard multi-head self-attention transformer operating on sparse non-zero token positions; the decoder employs the Performer linear attention approximation to maintain efficiency over the full 19,264-gene output space. Pre-training used a masked reconstruction objective over a curated scRNA-seq corpus of approximately 50 billion effective gene tokens, drawn from publicly available datasets representing more than 50 million human cells.
On the Zheng68K cell type annotation benchmark, xTrimoGene achieves a macro-F1 score of 0.7354 (+/- 0.0189), outperforming scBERT (F1: 0.6695) and ACTINN (F1: 0.6486). On perturbation response prediction using the Perturb-seq framework, xTrimoGene reduces mean squared error on the top-20 differentially expressed genes by 14.8% relative to the GEARS baseline. Drug combination synergy prediction tasks similarly show improvement over DeepSynergy and random forest comparators. The model maintains above 0.8 Pearson correlation on masked value recovery even at 96% sparsity, demonstrating robustness to the extreme data sparsity typical of single-cell assays.
xTrimoGene is designed for researchers working with large-scale single-cell transcriptomics data. Its pre-trained representations can be fine-tuned for cell type annotation in large atlases, perturbation effect prediction (relevant to CRISPR screen analysis and drug mechanism studies), and drug combination synergy scoring. The hosted API service lowers the computational barrier for wet-lab groups that generate scRNA-seq data but lack the infrastructure for large-scale transformer training, enabling them to apply foundation model embeddings to their own datasets without GPU cluster access.
xTrimoGene demonstrates that the computational bottleneck in applying foundation models to single-cell genomics is not inherent to the data or task but rather a consequence of using architectures designed for dense sequences. By treating sparsity as an opportunity rather than an obstacle, the model achieves state-of-the-art results on multiple benchmark tasks while using a fraction of the compute of contemporary approaches. Published at NeurIPS 2023 — a top machine learning venue — and backed by BioMap's production API, it has influenced subsequent work on efficient architectures for genomic foundation models. A current limitation is that the model processes individual cells in isolation and does not capture spatial context or cell-cell communication signals, which remain active areas of development in the single-cell foundation model space.
Gong, J., et al. (2023) xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data. bioRxiv.
DOI: 10.1101/2023.03.24.534055