Overview

xTrimoGene is an efficient transformer architecture for representation learning on single-cell RNA sequencing (scRNA-seq) data, developed by BioMap in collaboration with researchers from Tsinghua University and Mohamed bin Zayed University of Artificial Intelligence. It was presented at NeurIPS 2023. The model addresses a fundamental bottleneck in single-cell genomics: publicly available scRNA-seq datasets now exceed 50 million human cell records, each measuring approximately 20,000 genes, creating a data matrix so large that standard transformer architectures face prohibitive memory and compute demands. xTrimoGene resolves this by exploiting a key property of the data — gene expression matrices are inherently sparse, with the majority of genes showing zero or near-zero expression in any given cell.

The central innovation is an asymmetric encoder-decoder design (designated xTrimoGene-alpha) in which the encoder operates only on non-zero, unmasked gene positions — roughly 10% of the full sequence length — while a lightweight Performer-based decoder handles reconstruction across the full gene space. This asymmetry yields a 10-100x reduction in floating-point operations compared to a dense transformer baseline, making large-scale pre-training tractable without sacrificing predictive accuracy. The result is a model that can be trained on data volumes that were previously impractical for transformer-based approaches.

The flagship xTrimoGene-100M model, with approximately 100 million parameters, was pre-trained on a curated corpus of roughly 50 billion effective gene tokens spanning over 50 million human cells. Performance on downstream benchmarks improves consistently with model scale, following a pattern analogous to the scaling laws observed in protein language models and large language models.

Key Features

Asymmetric encoder-decoder architecture: The encoder attends only to expressed (non-zero) gene positions, reducing the effective sequence length processed at each layer to roughly 10% of the full gene count, while the Performer decoder reconstructs across all ~20,000 genes.
Sparsity-aware pre-training: Rather than treating sparse gene expression matrices as a challenge to be padded around, xTrimoGene treats sparsity as a structural prior and builds it directly into the computational graph.
Auto-discretization of continuous expression values: Continuous expression counts are projected through a learned lookup table with 100 bins, applying a leaky ReLU and cross-layer projection to produce embeddings that preserve the ordering semantics of expression magnitudes.
Computational efficiency: Achieves 8.38x10^18 FLOPs per pre-training step versus 2.46x10^20 for a native transformer — a 29-fold reduction — and trains approximately 3x faster than a comparable encoder-only Performer baseline (scBERT).
Consistent scaling behavior: Downstream task performance improves monotonically as model size increases from small variants to xTrimoGene-100M, suggesting the architecture scales predictably with additional data and parameters.
API-accessible service: BioMap provides a hosted inference endpoint at api.biomap.com, allowing researchers to generate cell representations without running their own pre-training infrastructure.

Technical Details

xTrimoGene-100M contains approximately 100 million parameters distributed across an asymmetric encoder-decoder stack with a 2:1 layer ratio and a 1.5:1 attention head ratio between encoder and decoder. The encoder is a standard multi-head self-attention transformer operating on sparse non-zero token positions; the decoder employs the Performer linear attention approximation to maintain efficiency over the full 19,264-gene output space. Pre-training used a masked reconstruction objective over a curated scRNA-seq corpus of approximately 50 billion effective gene tokens, drawn from publicly available datasets representing more than 50 million human cells.

On the Zheng68K cell type annotation benchmark, xTrimoGene achieves a macro-F1 score of 0.7354 (+/- 0.0189), outperforming scBERT (F1: 0.6695) and ACTINN (F1: 0.6486). On perturbation response prediction using the Perturb-seq framework, xTrimoGene reduces mean squared error on the top-20 differentially expressed genes by 14.8% relative to the GEARS baseline. Drug combination synergy prediction tasks similarly show improvement over DeepSynergy and random forest comparators. The model maintains above 0.8 Pearson correlation on masked value recovery even at 96% sparsity, demonstrating robustness to the extreme data sparsity typical of single-cell assays.

Applications

xTrimoGene is designed for researchers working with large-scale single-cell transcriptomics data. Its pre-trained representations can be fine-tuned for cell type annotation in large atlases, perturbation effect prediction (relevant to CRISPR screen analysis and drug mechanism studies), and drug combination synergy scoring. The hosted API service lowers the computational barrier for wet-lab groups that generate scRNA-seq data but lack the infrastructure for large-scale transformer training, enabling them to apply foundation model embeddings to their own datasets without GPU cluster access.

Impact

xTrimoGene demonstrates that the computational bottleneck in applying foundation models to single-cell genomics is not inherent to the data or task but rather a consequence of using architectures designed for dense sequences. By treating sparsity as an opportunity rather than an obstacle, the model achieves state-of-the-art results on multiple benchmark tasks while using a fraction of the compute of contemporary approaches. Published at NeurIPS 2023 — a top machine learning venue — and backed by BioMap's production API, it has influenced subsequent work on efficient architectures for genomic foundation models. A current limitation is that the model processes individual cells in isolation and does not capture spatial context or cell-cell communication signals, which remain active areas of development in the single-cell foundation model space.

Overview

Key Features

Asymmetric encoder-decoder architecture: The encoder attends only to expressed (non-zero) gene positions, reducing the effective sequence length processed at each layer to roughly 10% of the full gene count, while the Performer decoder reconstructs across all ~20,000 genes.

Sparsity-aware pre-training: Rather than treating sparse gene expression matrices as a challenge to be padded around, xTrimoGene treats sparsity as a structural prior and builds it directly into the computational graph.

Auto-discretization of continuous expression values: Continuous expression counts are projected through a learned lookup table with 100 bins, applying a leaky ReLU and cross-layer projection to produce embeddings that preserve the ordering semantics of expression magnitudes.

Computational efficiency: Achieves 8.38x10^18 FLOPs per pre-training step versus 2.46x10^20 for a native transformer — a 29-fold reduction — and trains approximately 3x faster than a comparable encoder-only Performer baseline (scBERT).

Consistent scaling behavior: Downstream task performance improves monotonically as model size increases from small variants to xTrimoGene-100M, suggesting the architecture scales predictably with additional data and parameters.

API-accessible service: BioMap provides a hosted inference endpoint at api.biomap.com, allowing researchers to generate cell representations without running their own pre-training infrastructure.

Technical Details

Applications

Impact

xTrimoGene

Overview

Key Features

Technical Details

Applications

Impact

Citation

xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data

Metrics

Citations

Tags

Resources

xTrimoGene

Overview

Key Features

Technical Details

Applications

Impact

Citation

xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data

Metrics

Citations

Tags

Resources